Experiment with GMM-Based Artefact Localization in Czech Synthetic Speech

Přibil, Jiří; Přibilová, Anna; Matoušek, Jindřich

doi:10.1007/978-3-319-24033-6_3

Jiří Přibil^15,16,
Anna Přibilová¹⁷ &
Jindřich Matoušek¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1814 Accesses
3 Citations

Abstract

The paper describes an experiment with using the statistical approach based on the Gaussian mixture models (GMM) for localization of artefacts in the synthetic speech produced by the Czech text-to-speech system employing the unit selection principle. In addition, the paper analyzes influence of different number of used GMM mixtures, and the influence of setting of the frame shift during the spectral feature analysis on the resulting artefact position accuracy. Obtained results of performed experiments confirm proper function of the chosen concept and the presented artefact position localizer can be used as an alternative to the standardly applied manual localization method.

The work has been supported by the Technology Agency of the Czech Republic, project No. TA 01030476, the Grant Agency of the Slovak Academy of Sciences (VEGA 2/0013/14), and the Ministry of Education of the Slovak Republic (KEGA 022STU-4/2014).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Tihelka, D., Kala, J., Matoušek, J.: Enhancements of viterbi search for fast unit selection synthesis. In: Proceedings of Interspeech 2010, Makuhari, Japan, pp. 174–177 (2010)
Google Scholar
Matoušek, J., Tihelka, D.: Annotation errors detection in TTS corpora. In: Proceeding of Interspeech 2013, Lyon, France, pp. 1511–1515 (2013)
Google Scholar
Legát, M., Matoušek, J.: Identifying concatenation discontinuities by hierarchical divisive clustering of pitch contours. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 171–178. Springer, Heidelberg (2011)
Chapter Google Scholar
Tihelka, D., Matoušek, J., Kala, J.: Quality deterioration factors in unit selection speech synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 508–515. Springer, Heidelberg (2007)
Chapter Google Scholar
Legát, M., Tihelka, D., Matoušek, J.: Configuring TTS evaluation method based on unit cost outlier detection. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 177–184. Springer, Heidelberg (2013)
Chapter Google Scholar
Bello, C., Ribas, D., Calvo, J.R., Ferrer, C.A.: From speech quality measures to speaker recognition performance. In: Bayro-Corrochano, E., Hancock, E. (eds.) CIARP 2014. LNCS, vol. 8827, pp. 199–206. Springer, Heidelberg (2014)
Google Scholar
Juang, B.H., Rabiner, L.R.: Hidden Markov Models for Speech Recognition. Technometrics 33(3), 251–272 (1991)
Article MathSciNet MATH Google Scholar
Reynolds, D.A., Rose, R.C.: Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Transactions on Speech and Audio Processing 3, 72–83 (1995)
Article Google Scholar
Togneri, R., Pullella, D.: An Overview of Speaker Identification: Accuracy and Robustness Issues. IEEE Circuits and Systems Magazine 11(2), 23–61 (2011)
Article Google Scholar
Přibil, J., Přibilová, A., Matoušek, J.: Detection of artefacts in Czech synthetic speech based on ANOVA statistics. In: Proc. of the 37th International Conference on Telecommunications and Signal Processing TSP 2014, Berlin, Germany, pp. 414–418 (2014)
Google Scholar
Venturini, A., Zao, L., Coelho, R.: On Speech Features Fusion, \(\alpha \)-Integration Gaussian Modeling and Multi-Style Training for Noise Robust Speaker Classification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(12), 1951–1964 (2014)
Article Google Scholar
Shah, M., Chakrabarti, C., Spanias, A.: Within and Cross-Corpus Speech Emotion Recognition Using Latent Topic Model-Based Features. EURASIP Journal on Audio, Speech, and Music Processing 2015(4), 1–17 (2015)
Google Scholar
Přibil, J., Přibilová, A.: Evaluation of Influence of Spectral and Prosodic Features on GMM Classification of Czech and Slovak Emotional Speech. EURASIP Journal on Audio, Speech, and Music Processing 2013(8), 1–22 (2013)
Google Scholar
Nabney, I.T.: Netlab Pattern Analysis Toolbox (retrieved October 2, 2013). http://www.mathworks.com/matlabcentral/fileexchange/2654-netlab

Download references

Author information

Authors and Affiliations

Department of Cybernetics, Faculty of Applied Sciences, University of West Bohemia, Univerzitní 8, 306 14, Plzeň, Czech Republic
Jiří Přibil & Jindřich Matoušek
SAS, Institute of Measurement Science, Dúbravská cesta 9, 841 04, Bratislava, Slovakia
Jiří Přibil
Faculty of Electrical Engineering and Information Technology, Institute of Electronics and Photonics, Slovak University of Technology, Ilkovičova 3, 812 19, Bratislava, Slovakia
Anna Přibilová

Authors

Jiří Přibil
View author publications
You can also search for this author in PubMed Google Scholar
Anna Přibilová
View author publications
You can also search for this author in PubMed Google Scholar
Jindřich Matoušek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiří Přibil .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Pavel Král
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Přibil, J., Přibilová, A., Matoušek, J. (2015). Experiment with GMM-Based Artefact Localization in Czech Synthetic Speech. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-24033-6_3
Published: 11 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics