Skip to main content
Log in

A Method of Real-Time Dynamic Measurement of a Speaker’s Emotional State from a Speech Waveform

  • ACOUSTIC MEASUREMENTS
  • Published:
Measurement Techniques Aims and scope

The problems of implementing systems with a voice interface for remote service of the population are examined. The effectiveness of such systems can be enhanced by automatic analysis of the changes of the emotional state of the user during dialogue. In order to do real-time measurements of the index of the dynamics of the emotional state, it is proposed to use the effect of the sound (phonetic) variability of speech of the user at observation intervals that are of small duration (fractions of a minute). Based on an information-theoretic approach, a method was developed for acoustic measurements of the dynamics of the emotional state under conditions of small samples, using a scale-invariant measure of the variations of the speech waveform in the frequency domain. An example of the practical instantiation of this method in real-time conditions is examined. It is shown that in this case the delay in obtaining measurement results does not exceed 10–20 s. The results of experimental studies confirmed the rapid response of the proposed method and its sensitivity to modifications of the dynamics of the emotional state under the effect of external perturbations. The developed method can be used to introduce automated monitoring of the quality of voice samples of users of the unified biometric systems. Also, the method will be useful to enhance security by noncontact detection of potentially dangerous persons with short-term disturbance of the psychoemotional state.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.

Similar content being viewed by others

References

  1. S. K. Davis et al., Pers. Indiv. Differ., 160, No. 109938 (2020), https://doi.org/10.1016/j.paid.2020.109938.

  2. V. V. Savchenko and A. V. Savchenko, “A mode of refreshing voice samples in the Unified Biometric System in real time,” Izmer. Tekhn., No. 5, 58–65 (2020), https://doi.org/10.32446/0368-1025it.2020-5-58-65.

  3. V. V. Savchenko and A. V. Savchenko, “A method for measuring the index of acoustic quality of audio recordings prepared for recording and processing in the Unified Biometric System,” Izmer. Tekhn., No. 12, 40–47 (2019), https://doi.org/10.32446/0368-1025it.2019-12-40-46.

  4. E. I. Galyashina, Current problems of the identification of persons using sound records of telephone conversations,” in: Proc. 23rd Int. Sci. Practi. Conf. Activities of Law Enforcement Agencies in Contemporary Conditions, VSI MVD RF, Irkutsk (2018), pp. 141–146, https://istina.msu.ru/publications/article/167326015, acc. 8/14/2020.

  5. E. Falagiarda and O. Collignon, Cortex, 119, 184–194 (2019), https://doi.org/10.1016/j.cortex.2019.04.017.

    Article  Google Scholar 

  6. F. P. Akbulut, H. G. Perros, and M. Shahzad, Comp. Meth. Progr. Biomed., 195, No. 105571 (2020), https://doi.org/10.1016/j.cmpb.2020.105571.

  7. F. A. Shaqra, R. Duwairi, and M. Al-Ayyoub, Proced. Comp. Sci., 151, 37–44 (2019), https://doi.org/10.1016/j.procs.2019.04.009.

    Article  Google Scholar 

  8. J. M. Arana et al., Comp. Hum. Behav., 104, No. 106156 (2020), https://doi.org/10.1016/j.chb.2019.106156.

  9. M. Bourguignon et al., Neurolmage, 216, No. 116788 (2020), https://doi.org/10.1016/j.neuroimage.2020.116788.

  10. Z. Liu et al., Brain Lang., 203, No. 104755 (2020), https://doi.org/10.1016/j.bandl.2020.104755.

  11. B. Schuller, “Voice and speech analysis in search of states and traits,” in: A. A. Salah and T. Gevers (eds.), Computer Analysis of Human Behavior, Springer, Heidelberg (2011), https://doi.org/10.1007/978-0-85729-994-9_9.

    Chapter  Google Scholar 

  12. D. Cardona et al., Neurocomputing, 265, 78–90 (2017), https://doi.org/10.1016/j.neucom.2016.09.140.

    Article  Google Scholar 

  13. D. Yu and L. Deng, Automatic Speech Recognition: A Deep Learning Approach, Springer, (2014), https://doi.org/10.1007/978-1-4471-5779-3.

    Book  MATH  Google Scholar 

  14. M. Schuster, Lect. Notes Comp. Sci., 6230, 8–10 (2010), https://doi.org/10.1007/978-3-642-15246-7_3.

  15. R. Rammohan et al., J. Allergy Clin. Immunol., 139, Iss. 2, No. ab250 (2017), https://doi.org/10.1016/j.jaci.2016.12.804.

  16. N. A. Volodin, T. V. Ermolenko, and V. V. Semenyuk, “A study of the effectiveness of the application of neural networks for recognition of human emotions through the voice,” in: Donetsk Readings 2019: Education, Science, Innovations, Culture, and the Calls to Modernity. Proc. 4th Int. Sci. Conf. (2019), pp. 221–223, https://elibrary.ru/ download/elibrary_41422521_75290048.pdf, acc. Aug. 14, 2020.

  17. A. M. Grachev, D. I. Ignatov, and A. V. Savchenko, Appl. Soft Comput., 79, 354–362 (2019), https://doi.org/10.1016/j.asoc.2019.03.057.

    Article  Google Scholar 

  18. R. A. Ustinov, “Features of modern protection systems for speech information,” Bezopasn. Inform. Tekhn. (electronic journal), 24, No. 4 (2017), https://doi.org/10.26583/bit.2017.4.08.

  19. S. Cui, E. Li, and X. Kang, “Autoregressive model based smoothing forensics of very short speech clips,” 2020 IEEE Int. Conf. on Multimedia and Expo (ICME), London, United Kingdom (2020), pp. 1–6, https://doi.org/10.1109/ICME46284.2020.9102765.

  20. V. V. Savchenko, Radioelectr. Commun. Syst., 63, No. 1, 42–54 (2020), https://doi.org/10.3103/S0735272720010045.

    Article  Google Scholar 

  21. V. V. Savchenko and A. V. Savchenko, “The criterion of a guaranteed level of signifi cance in the problem of automatic segmentation of a speech waveform,” Radiotekhn. Elektron., 65, No. 11, 1060–1066 (2020), https://doi.org/10.31857/S0033849420110157.

  22. R. G. Hautamäki et al., Speech Commun., 95, 1–15 (2017), https://doi.org/10.1016/j.specom.2017.10.002.

    Article  Google Scholar 

  23. N. N. Lebedev and E. D. Karimov, “Acoustic characteristics of a speech waveform as an indicator of the functional state of the person,” Usp. Fiziol. Nauk, 45, No. 1, 57–95 (2014), http://naukarus.com/akusticheskieharakteristiki-rechevogo-signala-kak-pokazatel-funktsionalnogo sostoyaniya-cheloveka, acc. Aug. 14, 2020.

  24. V. V. Savchenko, J. Commun. Technol. Electr., 63, No. 1, 53–57 (2018), https://doi.org/10.1134/S1064226918010126.

    Article  Google Scholar 

  25. A. V. Savchenko and V. V. Savchenko, J. Commun. Technol. Electr., 61, No. 4, 430–435 (2016), https://doi.org/10.1134/S1064226916040112.

    Article  Google Scholar 

  26. V. V. Savchenko, “A method of measuring the index of acoustic voice quality based on an information-theoretic approach,” Izmer. Tekhn., No. 1, 60–64 (2018), https://doi.org/10.32446/0368-1025it.2018-1-60-64.

  27. V. V. Savchenko and L. V. Savchenko, “ A method of measuring the index of intelligibility of speech signals in the Kullback–Leibler informational metric,” Izmer. Tekhn., No. 9, 59–64 (2019), https://doi.org/10.32446/0368-1025it.2019-9-59-64.

  28. L. V. Savchenko and A. V. Savchenko, J. Commun. Technol. Electr., 64, No. 3, 238–244 (2019), https://doi.org/10.1134/S1064226919030173.

    Article  Google Scholar 

  29. A. V. Savchenko and V. V. Savchenko, “A method of measuring the frequency of the fundamental component of a speech waveform for systems of the acoustic analysis of speech,” Izmer. Tekhn., No. 3, 59–63 (2019), https://doi.org/10.32446/0368-1025it.2019-3-59-63.

  30. A. V. Savchenko, “Three-Way decisions in efficient classification of piecewise stationary speech waveforms,” in Polkowski L. et al. (eds.), Rough Sets. IJCRS 2017. Lecture Notes in Computer Science, Springer, Cham (2017), Vol. 10314, https://doi.org/10.1007/978-3-319-60840-2_19.

  31. S. Kullback, Information Theory and Statistics, Dover Publications, N.Y. (1997), https://www.amazon.com/dp/0486696847, acc. Aug. 14, 2020.

  32. R. M. Gray et al., IEEE T. Signal Proces., 28, No. 4, 367–377 (1980), https://doi.org/10.1109/TASSP.1980.1163421.

    Article  Google Scholar 

  33. A. V. Savchenko, V. V. Savchenko, and L. V. Savchenko, “Optimization of Gain in Symmetrized Itakura–Saito Discrimination for Pronunciation Learning,” in: A. Kononov et al. (eds), Mathematical Optimization Theory and Operations Research. MOTOR 2020. Lecture Notes in Computer Science, Springer, Cham (2020), Vol. 12095, https://doi.org/10.1007/978-3-030-49988-4_30.

  34. V. Vestman et al., Speech Commun., 99, 62–79 (2018), https://doi.org/10.1016/j.specom.2018.02.009.

    Article  Google Scholar 

  35. Q. Candan, Signal Process., 166, No. 107256 (2020), 10.1016/j.sigpro.2019.107256.

  36. K. S. Tuncel and M. G. Baydogan, Pattern Recogn., 73, 202–215 (2018), https://doi.org/10.1016/j.patcog.2017.08.016.

    Article  ADS  Google Scholar 

  37. V. V. Savchenko and A. V. Savchenko, Radioelectron. Commun. Syst., 62, 276–286 (2019), https://doi.org/10.3103/S0735272719050042.

    Article  Google Scholar 

  38. S. L. Marple, Digital Spectral Analysis with Applications, Dover Publications, Mineola, New York (2019), 2nd ed., https://www.goodreads.com/book/show/19484239, acc. Aug. 14, 2020.

Download references

This study was performed with the support of the Russian Science Foundation (Project No. 20-71-10010).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to L. V. Savchenko.

Additional information

Translated from Izmeritel’naya Tekhnika, No. 4, pp. 49–57, April, 2021.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Savchenko, L.V., Savchenko, A.V. A Method of Real-Time Dynamic Measurement of a Speaker’s Emotional State from a Speech Waveform. Meas Tech 64, 319–327 (2021). https://doi.org/10.1007/s11018-021-01935-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11018-021-01935-z

Keywords

Navigation