GMM-Based Evaluation of Emotional Style Transformation in Czech and Slovak
- 196 Downloads
In the development of the voice conversion and the emotional speech style transformation in the text-to-speech systems, it is very important to obtain feedback information about the users’ opinion on the resulting synthetic speech quality. For this reason, the evaluations of the quality of the produced synthetic speech must often be performed for comparison. The main aim of the experiments described in this paper was to find out whether the classifier based on Gaussian mixture models (GMMs) could be applied for evaluation of male and female resynthesized speech that had been transformed from neutral to four emotional states (joy, surprise, sadness, and anger) spoken in Czech and Slovak languages. We suppose that it is possible to combine this GMM-based statistical evaluation with the classical one in the form of listening tests or it can replace them. For verification of our working hypothesis, a simple GMM emotional speech classifier with a one-level structure was realized. The next task of the performed experiment was to investigate the influence of different types and values (mean, median, standard deviation, relative maximum, etc.) of the used speech features (spectral and/or supra-segmental) on the GMM classification accuracy. The obtained GMM evaluation scores are compared with the results of the conventional listening tests based on the mean opinion scores. In addition, correctness of the GMM classification is analyzed with respect to the influence of the setting of the parameters during the GMM training—the number of mixture components and the types of speech features. The paper also describes the comparison experiment with the reference speech corpus taken from the Berlin database of emotional speech in German language as the benchmark for the evaluation of the performance of our one-level GMM classifier. The obtained results confirm practical usability of the developed GMM classifier, so we will continue in this research with the aim to increase the classification accuracy and compare it with other approaches like the support vector machines.
KeywordsEmotional speech transformation Spectral and prosodic features of speech GMM-based emotion classification
The work has been supported by the Grant Agency of the Slovak Academy of Sciences (VEGA 2/0013/14) and the Ministry of Education of the Slovak Republic (VEGA1/0987/12, KEGA 022STU-4/2014).
- 9.Atassi H, Esposito A, Smékal Z. Emotion recognition from spontaneous Slavic speech. In: Proceedings of the IEEE international conference on cognitive infocommunications; 2012. p. 389–94.Google Scholar
- 11.Milton A., Tamil Selvi S. Class-specific multiple classifiers scheme to recognize emotions from speech signals. Comput Speech Lang. 2013. doi: 10.1016/j.csl.2013.08.004.
- 14.Maia R, Akamine M. On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis. Comput Speech Lang. 2013. doi: 10.1016/j.csl.2013.10.001.
- 15.Přibilová A, Přibil J. Spectrum modification for emotional speech synthesis. In: Esposito A, Hussain A, Marinaro M, Martone R, editors. Multimodal signals: cognitive and algorithmic issues. LNAI 5398. Berlin: Springer; 2009. p. 232–41.Google Scholar
- 17.Vích R, Přibil J, Smékal Z. New cepstral zero-pole vocal tract models for TTS synthesis. In: Proceedings of IEEE Region 8 EUROCON’2001; 2001, vol. 2, p. 458–62.Google Scholar
- 19.Přibil J, Přibilová A. Statistical analysis of complementary spectral features of emotional speech in Czech and Slovak. In: Habernal I, Matoušek V, editors. Text, speech and dialogue. LNAI 6836. Berlin: Springer; 2011. p. 299–306.Google Scholar
- 20.Přibil J, Přibilová A. Comparison of spectral and prosodic parameters of male and female emotional speech in Czech and Slovak. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP); 2011, p. 4720–3.Google Scholar
- 22.Přibil J, Přibilová A. Evaluation of influence of spectral and prosodic features on GMM classification of Czech and Slovak emotional speech. EURASIP J Audio Speech Music Process. 2013;2013(8):1–22.Google Scholar
- 23.Přibil J, Přibilová A. Influence of visual stimuli on evaluation of converted emotional speech by listening tests. In: Esposito A, Vinciarelli A, Vicsi K, Pelachaud C, Nijholt A, editors. Analysis of verbal and nonverbal communication and enactment. LNCS 6800. Berlin: Springer; 2011. p. 378–92.Google Scholar
- 25.Siegert I, Böck R, Wendemuth A. Inter-rater reliability for emotion annotation in human-computer interaction—comparison and methodological improvements. J Multimodal User Interfaces Special Issue From Multimodal Analysis to Real-Time Interactions with Virtual Agents, doi: 10.1007/s12193-013-0129-9, Springer, 2013 (online).
- 26.Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B. A database of German emotional speech. In Proceedings of INTERSPEECH 2005, Lisbon, Portugal, p. 1517–1520.Google Scholar
- 27.Vondra M, Vích R. Recognition of emotions in german speech using Gaussian Mixture models. In: Esposito A, Hussain A, Marinaro M, Martone R, editors. Multimodal signals: cognitive and algorithmic issues. LNAI 5398. Berlin: Springer; 2009. p. 256–63.Google Scholar
- 30.Bourouba H, Korba CA, Djemili R. Novel approach in speaker identification using SVM and GMM. Control Eng Appl Inform. 2013;15(3):87–95.Google Scholar
- 33.Nabney IT. Netlab Pattern Analysis Toolbox. Copyright (1996–2001). Retrieved 16 Feb 2012, from http://www.mathworks.com/matlabcentral/fileexchange/2654-netlab.
- 35.Matoušek J, Tihelka D. SVM-based detection of misannotated words in read speech corpora. In: Habernal I, Matoušek V, editors. Text, speech, and dialogue. LNCS 8082. Berlin: Springer; 2013. p. 457–64.Google Scholar