Skip to main content
Log in

Method of Modeling and Harmonic Synthesis of Phonemes of Human Speech with Emotional Coloring

  • AUTOMATION TEXT PROCESSING
  • Published:
Automatic Documentation and Mathematical Linguistics Aims and scope

Abstract

Text-to-speech synthesis technology is one of the most important elements in the field of working with human speech. This paper makes an introductory analysis of speaker’s speech sets recorded with different emotional coloration. That enables the identification of patterns in the frequency dynamics of harmonics and the development of a method for the analytical description of the emotional coloration of speech. We propose a model that describes changes in frequency in the vowels and phonemes pronounced with an emotional connotation. The model is based on the use of sigmoid functions and the results of a technique that allows the synthesis of the signal of emotionally colored phonemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.
Fig. 11.
Fig. 12.

REFERENCES

  1. Li, X., Xu, Y., Zhang, X., et al., Analysis of the research progress in speech emotion recognition, Mod. Comput., 2020, vol. 20, pp. 44–47.

    Google Scholar 

  2. Su, Z., Affective speech synthesis, PhD Dissertation, Univ. of Science and Technology of China, 2006.

  3. Li, H., Research on speech emotion recognition based on feature selection and optimization, PhD Dissertation, Xidian Univ., 2019.

  4. Guo, Y., Fu, J., and He, F., Research and design of emotion recognition circuit based on speech affective computing, Mod. Electron. Technique, 2019, vol. 42, no. 22, p. 68.

    Google Scholar 

  5. Li, J., Research on dimensional emotion recognition based on speech signal, PhD Dissertation, Tianjin Normal Univ., 2016, p. 48.

  6. Li, Y., Wei, D., and Wang, L., Emotional speech synthesis method based on PSOLA and DCT, Comput. Eng., 2017, vol. 12, pp. 284–288.

    Google Scholar 

  7. Li, B., Study on speech synthesis of neurocomputational model for emotional speech, PhD Dissertation, Tianjin Univ., 2016.

  8. Han, J., Zhang, L., and Zheng, T., Speech Signal Processing, Tsinghua Univ. Press, 2013.

    Google Scholar 

  9. Lan, G. and Morgunov, A., Method for reconstructing human voice phonemes, Vestn. Sovrem. Issled., 2018, no. 10, pp. 130–135.

  10. Sigmoid function, Wikipedia. https://en.wikipedia.org/ wiki/Sigmoid_function. Cited November 1, 2021.

  11. Ivanov, A.V., Trushin, V.A., and Markelova, G.V., Research on the spectrum of forced speech formants, Nauchn. Vestn. Novosibirsk. Gos. Tekh. Univ., 2015, vol. 61, no. 4, pp. 63–73.

  12. Bondarenko, V.P. and Bondar, V.A., Measurement of some characteristics of vowel sounds, Izv. Tomsk. Politekh. Univ., Inzhiniring Gosresursov, 1974, vol. 246, pp. 39–41.

    Google Scholar 

  13. Volkovets, A.I., Sozdanie i obrabotka zvuka pri razrabotke interaktivnykh prilozhenii (Sound Creation and Processing in the Development of Interactive Applications), Minsk: Belorusskii Gos. Univ. Inf. Radioelektron., 2017.

  14. Mandel, L., Interpretation of instantaneous frequencies, Am. J. Phys., 1974, vol. 42, no. 10, pp. 840–846. https://doi.org/10.1119/1.1987876

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to G. Lan or A. S. Fadeev.

Ethics declarations

The authors declare that they have no conflicts of interest.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lan, G., Fadeev, A.S. Method of Modeling and Harmonic Synthesis of Phonemes of Human Speech with Emotional Coloring. Autom. Doc. Math. Linguist. 57, 219–227 (2023). https://doi.org/10.3103/S0005105523040040

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0005105523040040

Keywords:

Navigation