Advertisement

Automatic Evaluation of Synthetic Speech Quality by a System Based on Statistical Analysis

  • Jiří Přibil
  • Anna Přibilová
  • Jindřich Matoušek
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11107)

Abstract

The paper describes a system for automatic evaluation of speech quality based on statistical analysis of differences in spectral properties, prosodic parameters, and time structuring within the speech signal. The proposed system was successfully tested in evaluation of sentences originating from male and female voices and produced by a speech synthesizer using the unit selection method with two different approaches to prosody manipulation. The experiments show necessity of all three types of speech features for obtaining correct, sharp, and stable results. A detailed analysis shows great influence of the number of statistical parameters on correctness and precision of the evaluated results. Larger size of the processed speech material has a positive impact on stability of the evaluation process. Final comparison documents basic correlation with the results obtained by the standard listening test.

Keywords

Listening test Objective and subjective evaluation Quality of synthetic speech Statistical analysis 

References

  1. 1.
    Grůber, M., Matoušek, J.: Listening-test-based annotation of communicative functions for expressive speech synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS (LNAI), vol. 6231, pp. 283–290. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15760-8_36CrossRefGoogle Scholar
  2. 2.
    Monte-Moreno, E., Chetouani, M., Faundez-Zanuy, M., Sole-Casals, J.: Maximum likelihood linear programming data fusion for speaker recognition. Speech Commun. 51(9), 820–830 (2009)CrossRefGoogle Scholar
  3. 3.
    Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3, 72–83 (1995)CrossRefGoogle Scholar
  4. 4.
    Xu, L., Yang, Z.: Speaker identification based on state space model. Int. J. Speech Technol. 19(2), 407–414 (2016)CrossRefGoogle Scholar
  5. 5.
    Campbell, W.M., Campbell, J.P., Reynolds, D.A., Singer, E., Torres-Carrasquillo, P.A.: Support vector machines for speaker and language recognition. Comput. Speech Lang. 20(2–3), 210–229 (2006)CrossRefGoogle Scholar
  6. 6.
    Lee, C.Y., Lee, Z.J.: A novel algorithm applied to classify unbalanced data. Appl. Soft Comput. 12, 2481–2485 (2012)CrossRefGoogle Scholar
  7. 7.
    Mizushima, T.: Multisample tests for scale based on kernel density estimation. Stat. Probab. Lett. 49, 81–91 (2000)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Hussain, T., Siniscalchi, S.M., Lee, C.C., Wang, S.S., Tsao, Y., Liao, W.H.: Experimental study on extreme learning machine applications for speech enhancement. IEEE Accesss 5, 25542 (2017)CrossRefGoogle Scholar
  9. 9.
    van Santen, J.P.H.: Segmental duration and speech timing. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds.) Computing Prosody. Springer, New York (1997).  https://doi.org/10.1007/978-1-4612-2258-3_15CrossRefGoogle Scholar
  10. 10.
    Martinez, C.C., Cassol, M.: Measurement of voice quality, anxiety and depression symptoms after therapy. J. Voice 29(4), 446–449 (2015)CrossRefGoogle Scholar
  11. 11.
    Rietveld, T., van Hout, R.: The t test and beyond: recommendations for testing the central tendencies of two independent samples in research on speech, language and hiering pathology. J. Commun. Disord. 58, 158–168 (2015)CrossRefGoogle Scholar
  12. 12.
    Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Atlanta (Georgia, USA), pp. 373–376 (1996)Google Scholar
  13. 13.
    Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi search for fast unit selection synthesis. In: Proceedings of INTERSPEECH 2010, Makuhari, Japan, pp. 174–177 (2010)Google Scholar
  14. 14.
    Jůzová, M., Tihelka, D., Skarnitzl, R.: Last syllable unit penalization in unit selection TTS. In: Ekštein, K., Matoušek, V. (eds.) TSD 2017. LNCS (LNAI), vol. 10415, pp. 317–325. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-64206-2_36CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Jiří Přibil
    • 1
    • 2
  • Anna Přibilová
    • 3
  • Jindřich Matoušek
    • 2
  1. 1.Institute of Measurement ScienceSASBratislavaSlovakia
  2. 2.Faculty of Applied Sciences, Department of CyberneticsUWBPilsenCzech Republic
  3. 3.FEE & IT, Institute of Electronics and PhotonicsSUT in BratislavaBratislavaSlovakia

Personalised recommendations