Abstract
The paper describes an experiment with using the statistical approach based on the Gaussian mixture models (GMM) for localization of artefacts in the synthetic speech produced by the Czech text-to-speech system employing the unit selection principle. In addition, the paper analyzes influence of different number of used GMM mixtures, and the influence of setting of the frame shift during the spectral feature analysis on the resulting artefact position accuracy. Obtained results of performed experiments confirm proper function of the chosen concept and the presented artefact position localizer can be used as an alternative to the standardly applied manual localization method.
The work has been supported by the Technology Agency of the Czech Republic, project No. TA 01030476, the Grant Agency of the Slovak Academy of Sciences (VEGA 2/0013/14), and the Ministry of Education of the Slovak Republic (KEGA 022STU-4/2014).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tihelka, D., Kala, J., Matoušek, J.: Enhancements of viterbi search for fast unit selection synthesis. In: Proceedings of Interspeech 2010, Makuhari, Japan, pp. 174–177 (2010)
Matoušek, J., Tihelka, D.: Annotation errors detection in TTS corpora. In: Proceeding of Interspeech 2013, Lyon, France, pp. 1511–1515 (2013)
Legát, M., Matoušek, J.: Identifying concatenation discontinuities by hierarchical divisive clustering of pitch contours. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 171–178. Springer, Heidelberg (2011)
Tihelka, D., Matoušek, J., Kala, J.: Quality deterioration factors in unit selection speech synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 508–515. Springer, Heidelberg (2007)
Legát, M., Tihelka, D., Matoušek, J.: Configuring TTS evaluation method based on unit cost outlier detection. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 177–184. Springer, Heidelberg (2013)
Bello, C., Ribas, D., Calvo, J.R., Ferrer, C.A.: From speech quality measures to speaker recognition performance. In: Bayro-Corrochano, E., Hancock, E. (eds.) CIARP 2014. LNCS, vol. 8827, pp. 199–206. Springer, Heidelberg (2014)
Juang, B.H., Rabiner, L.R.: Hidden Markov Models for Speech Recognition. Technometrics 33(3), 251–272 (1991)
Reynolds, D.A., Rose, R.C.: Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Transactions on Speech and Audio Processing 3, 72–83 (1995)
Togneri, R., Pullella, D.: An Overview of Speaker Identification: Accuracy and Robustness Issues. IEEE Circuits and Systems Magazine 11(2), 23–61 (2011)
Přibil, J., Přibilová, A., Matoušek, J.: Detection of artefacts in Czech synthetic speech based on ANOVA statistics. In: Proc. of the 37th International Conference on Telecommunications and Signal Processing TSP 2014, Berlin, Germany, pp. 414–418 (2014)
Venturini, A., Zao, L., Coelho, R.: On Speech Features Fusion, \(\alpha \)-Integration Gaussian Modeling and Multi-Style Training for Noise Robust Speaker Classification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(12), 1951–1964 (2014)
Shah, M., Chakrabarti, C., Spanias, A.: Within and Cross-Corpus Speech Emotion Recognition Using Latent Topic Model-Based Features. EURASIP Journal on Audio, Speech, and Music Processing 2015(4), 1–17 (2015)
Přibil, J., Přibilová, A.: Evaluation of Influence of Spectral and Prosodic Features on GMM Classification of Czech and Slovak Emotional Speech. EURASIP Journal on Audio, Speech, and Music Processing 2013(8), 1–22 (2013)
Nabney, I.T.: Netlab Pattern Analysis Toolbox (retrieved October 2, 2013). http://www.mathworks.com/matlabcentral/fileexchange/2654-netlab
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Přibil, J., Přibilová, A., Matoušek, J. (2015). Experiment with GMM-Based Artefact Localization in Czech Synthetic Speech. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-24033-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)