Experiment with Evaluation of Quality of the Synthetic Speech by the GMM Classifier

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8082)


This paper describes our experiment with using the Gaussian mixture models (GMM) for evaluation of the speech quality produced by different methods of speech synthesis and parameterization. In addition, the paper analyzes and compares influence of different types of features and different number of mixtures used for GMM evaluation. Finally, the GMM evaluation scores are compared with the results obtained by the conventional listening tests based on the mean opinion score (MOS) evaluations. Results of evaluations obtained by these two ways are in correspondence.


GMM classifier spectral and prosodic features of speech synthetic speech evaluation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Audibert, N., Vincent, D., Aubergé, V., Rosec, O.: Evaluation of Expresive Speech Resynthesis. In: Proceedings of LREC 2006 Workshop on Emotional Corpora, Gènes, pp. 37–40 (2006)Google Scholar
  2. 2.
    Iriondo, I., Planet, S., Socoró, J.C., Martínez, E., Alías, F., Monzo, C.: Automatic Refinement of an Expressive Speech Corpus Assembling Subjective Perception and Automatic Classification. Speech Communication 51, 744–758 (2009)CrossRefGoogle Scholar
  3. 3.
    Takano, Y., Kondo, K.: Estimation of Speech Intelligibility Using Speech Recognition Systems. IEICE Transactions on Information and Systems E93D(12), 3368–3376 (2010)CrossRefGoogle Scholar
  4. 4.
    Vích, R., Nouza, J., Vondra, M.: Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 136–148. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Yun, S., Yoo, C.D.: Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion Classification. IEEE Transactions on Audio, Speech, and Language Processing 20(2), 585–598 (2012)CrossRefGoogle Scholar
  6. 6.
    Hosseinzadeh, D., Krishnan, S.: On the Use of Complementary Spectral Features for Speaker Recognition. EURASIP Journal on Advances in Signal Processing 2008, Article ID 258184, 10 pages (2008)Google Scholar
  7. 7.
    Lu, Y., Cooke, M.: The Contribution of Changes in F0 and Spectral Tilt to Increased Intelligibility of Speech Produced in Noise. Speech Communication 51(12), 1253–1262 (2009)CrossRefGoogle Scholar
  8. 8.
    Reynolds, D.A., Rose, R.C.: Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Transactions on Speech and Audio Processing 3, 72–83 (1995)CrossRefGoogle Scholar
  9. 9.
    Vích, R.: Cepstral Speech Model, Padé Approximation, Excitation, and Gain Matching in Cepstral Speech Synthesis. In: Proceedings of the 15th Biennial EURASIP Conference Biosignal 2000, Brno, Czech Republic, pp. 77–82 (2000)Google Scholar
  10. 10.
    Madlová, A.: Autoregressive and Cepstral Parametrization in Harmonic Speech Modelling. Journal of Electrical Engineering 53, 46–49 (2002)Google Scholar
  11. 11.
    Grůber, M., Hanzlíček, Z.: Czech Expressive Speech Synthesis in Limited Domain Comparison of Unit Selection and HMM-Based Approaches. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 656–664. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    Bishop, C.M., Nabney, I.T.: NETLAB Online Reference Documentation (accessed February 16, 2012),

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Faculty of Applied Sciences, Dept. of CyberneticsUniversity of West BohemiaPlzeňCzech Republic
  2. 2.SAS, Institute of Measurement ScienceBratislavaSlovakia
  3. 3.Faculty of Electrical Engineering & Information Technology, Institute of Electronics and PhotonicsSlovak University of TechnologyBratislavaSlovakia

Personalised recommendations