Skip to main content

Experiment with GMM-Based Artefact Localization in Czech Synthetic Speech

Part of the Lecture Notes in Computer Science book series (LNAI,volume 9302)

Abstract

The paper describes an experiment with using the statistical approach based on the Gaussian mixture models (GMM) for localization of artefacts in the synthetic speech produced by the Czech text-to-speech system employing the unit selection principle. In addition, the paper analyzes influence of different number of used GMM mixtures, and the influence of setting of the frame shift during the spectral feature analysis on the resulting artefact position accuracy. Obtained results of performed experiments confirm proper function of the chosen concept and the presented artefact position localizer can be used as an alternative to the standardly applied manual localization method.

Keywords

  • Quality of synthetic speech
  • Text-to-speech system
  • GMM classification
  • Statistical analysis

The work has been supported by the Technology Agency of the Czech Republic, project No. TA 01030476, the Grant Agency of the Slovak Academy of Sciences (VEGA 2/0013/14), and the Ministry of Education of the Slovak Republic (KEGA 022STU-4/2014).

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-24033-6_3
  • Chapter length: 9 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-24033-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tihelka, D., Kala, J., Matoušek, J.: Enhancements of viterbi search for fast unit selection synthesis. In: Proceedings of Interspeech 2010, Makuhari, Japan, pp. 174–177 (2010)

    Google Scholar 

  2. Matoušek, J., Tihelka, D.: Annotation errors detection in TTS corpora. In: Proceeding of Interspeech 2013, Lyon, France, pp. 1511–1515 (2013)

    Google Scholar 

  3. Legát, M., Matoušek, J.: Identifying concatenation discontinuities by hierarchical divisive clustering of pitch contours. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 171–178. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  4. Tihelka, D., Matoušek, J., Kala, J.: Quality deterioration factors in unit selection speech synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 508–515. Springer, Heidelberg (2007)

    CrossRef  Google Scholar 

  5. Legát, M., Tihelka, D., Matoušek, J.: Configuring TTS evaluation method based on unit cost outlier detection. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 177–184. Springer, Heidelberg (2013)

    CrossRef  Google Scholar 

  6. Bello, C., Ribas, D., Calvo, J.R., Ferrer, C.A.: From speech quality measures to speaker recognition performance. In: Bayro-Corrochano, E., Hancock, E. (eds.) CIARP 2014. LNCS, vol. 8827, pp. 199–206. Springer, Heidelberg (2014)

    Google Scholar 

  7. Juang, B.H., Rabiner, L.R.: Hidden Markov Models for Speech Recognition. Technometrics 33(3), 251–272 (1991)

    MathSciNet  CrossRef  MATH  Google Scholar 

  8. Reynolds, D.A., Rose, R.C.: Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Transactions on Speech and Audio Processing 3, 72–83 (1995)

    CrossRef  Google Scholar 

  9. Togneri, R., Pullella, D.: An Overview of Speaker Identification: Accuracy and Robustness Issues. IEEE Circuits and Systems Magazine 11(2), 23–61 (2011)

    CrossRef  Google Scholar 

  10. Přibil, J., Přibilová, A., Matoušek, J.: Detection of artefacts in Czech synthetic speech based on ANOVA statistics. In: Proc. of the 37th International Conference on Telecommunications and Signal Processing TSP 2014, Berlin, Germany, pp. 414–418 (2014)

    Google Scholar 

  11. Venturini, A., Zao, L., Coelho, R.: On Speech Features Fusion, \(\alpha \)-Integration Gaussian Modeling and Multi-Style Training for Noise Robust Speaker Classification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(12), 1951–1964 (2014)

    CrossRef  Google Scholar 

  12. Shah, M., Chakrabarti, C., Spanias, A.: Within and Cross-Corpus Speech Emotion Recognition Using Latent Topic Model-Based Features. EURASIP Journal on Audio, Speech, and Music Processing 2015(4), 1–17 (2015)

    Google Scholar 

  13. Přibil, J., Přibilová, A.: Evaluation of Influence of Spectral and Prosodic Features on GMM Classification of Czech and Slovak Emotional Speech. EURASIP Journal on Audio, Speech, and Music Processing 2013(8), 1–22 (2013)

    Google Scholar 

  14. Nabney, I.T.: Netlab Pattern Analysis Toolbox (retrieved October 2, 2013). http://www.mathworks.com/matlabcentral/fileexchange/2654-netlab

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiří Přibil .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Přibil, J., Přibilová, A., Matoušek, J. (2015). Experiment with GMM-Based Artefact Localization in Czech Synthetic Speech. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24033-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24032-9

  • Online ISBN: 978-3-319-24033-6

  • eBook Packages: Computer ScienceComputer Science (R0)