Skip to main content
Log in

A measure of differences in speech signals by the voice timbre

  • Published:
Measurement Techniques Aims and scope

Abstract

This research relates to the field of speech technologies, where the key issue is the optimization of speech signal processing under conditions of a prior uncertainty of its fine structure. The problem of automatic (objective) analysis of the speaker’s voice timbre using a speech signal of finite duration is considered. It is proposed to use a universal information-theoretic approach to solve it. Based on the Kullback-Leibler divergence, an expression was obtained to describe the asymptotically optimal decision statistic for differentiating speech signals by the voice timbre. The author highlights a serious obstacle during practical implementation of such statistics, namely: synchronization of the sequence of observations with the pitch of speech signals. To overcome the described obstacle, an objective measure of timbre-based differences in speech signals is proposed in terms of the acoustic theory of speech production and its “acoustic tube” type model of the speaker’s vocal tract. The possibilities of practical implementation of a new measure based on an adaptive recursive filter are considered. A full-scale experiment was set up and carried out. The experimental results confirmed two main properties of the proposed measure: high sensitivity to differences in speech signals in terms of voice timbre and invariance with respect to the fundamental pitch frequency. The obtained results can be used when designing and studying digital speech processing systems tuned to the speaker’s voice, for example, digital voice communication systems, biometric and biomedical systems, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The assumption of a Gaussian probability distribution does not limit the generality of the conclusions of this study, as this law is characterized by the maximum entropy for a given average power of the speech signal.

  2. COSH—cosine hyperbolic function.

  3. Researchers often prefer Berg’s method over other parametric spectral analysis methods due to its well-known advantages in terms of computational speed and, most importantly, stability of the spectral estimates of the autoregressive type that are formed on its basis.

  4. The Phoneme Training phonetic analysis and speech training information system: [website]. URL: https://sites.google.com/site/frompldcreators/produkty-1/phonemetraining (access date: May 18, 2023).

  5. This order is intended for autoregressive simulation of 4–5 AFC resonances of a typical vocal tract when pronouncing vowels in the frequency bandwidth of 0 to 4 kHz.

References

  1. Zhao, R., Erleke, E., Wang, L., Huang, J., Chen, Z.: The effects of timbre on voice interaction. In: Rau, P.-L.P. (ed.) Cross-Cultural Design: HCII 2023, Lecture Notes in Computer Science, vol. 14023. Springer, Cham (2023) https://doi.org/10.1007/978-3-031-35939-2_12

    Chapter  Google Scholar 

  2. Ando, Y.: Temporal and spatial features of speech signals. In: Signal processing in auditory neuroscience, pp. 81–101. Academic Press, (2019) https://doi.org/10.1016/B978-0-12-815938-5.00009-1

    Chapter  Google Scholar 

  3. Ternström, S.: Appl. Sci. 13(6), 3514 (2023). https://doi.org/10.3390/app13063514

    Article  Google Scholar 

  4. Song, W., Yue, Y., Zhang, Y., et al.: Multi-speaker multistyle speech synthesis with timbre and style disentanglement. In: Zhenhua, L., Jianqing, G., Kai, Y., Jia, J. (eds.) Man-machine speech communication: NCMMSC 2022, communications in computer and information science. Springer, Singapore (2022) https://doi.org/10.1007/978-981-99-2401-1_12

    Chapter  Google Scholar 

  5. Jialu, L., Hasegawa-Johnson, M., McElwain, N.L.: Speech. Commun. 133, 41–61 (2021). https://doi.org/10.1016/j.specom.2021.07.010

    Article  Google Scholar 

  6. Savchenko, V.V.: Radioelectron. Commun. Syst. 64(11), 592–603 (2021). https://doi.org/10.3103/S0735272721110030

    Article  Google Scholar 

  7. Savchenko, A.V., Savchenko, V.V.: Meas. Tech. 64(4), 928–935 (2022). https://doi.org/10.1007/s11018-022-02025-4

    Article  Google Scholar 

  8. Wei, Y., Gan, L., Huang, X.: Front. Psychol. 13, 869475 (2022). https://doi.org/10.3389/fpsyg.2022.869475

    Article  Google Scholar 

  9. Xue, J., Zhou, H., Song, H., Wu, B., Shi, L.: Speech. Commun. 147, 41–50 (2023). https://doi.org/10.1016/j.specom.2023.01.001

    Article  Google Scholar 

  10. Li, J., Zhang, L., Qiu, Z.: 5th International Conference on Intelligent Control, Measurement and Signal Processing (ICMSP). Chengdu., pp. 833–837 (2023). https://doi.org/10.1109/ICMSP58539.2023.10171030

    Book  Google Scholar 

  11. Igras-Cybulska, M., Hekiert, D., Cybulski, A., et al.: Work-in-Progress. In: 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW) Shanghai. pp. 355–359. (2023) https://doi.org/10.1109/VRW58643.2023.00079

    Chapter  Google Scholar 

  12. Cui, S., Li, E., Kang, X.: 2020 IEEE International Conference on Multimedia and Expo (ICME). London., pp. 1–6 (2020). https://doi.org/10.1109/ICME46284.2020.9102765

    Book  Google Scholar 

  13. Gupta, S., Fahad, M.S., Deepak, A.: Multimed Tools Appl 79, 23347–23365 (2020). https://doi.org/10.1007/s11042-020-09068-1

    Article  Google Scholar 

  14. Dai, B., Zahorian, S.: J. Acoust. Soc. Am. 104, 1805 (1998). https://doi.org/10.1121/1.423591

    Article  ADS  Google Scholar 

  15. Zakhar’ev, V.A., Petrovskii, A.A.: Metody parametrizatsii rechevogo signala na osnove analiza, sinkhronizirovannogo s chastotoi osnovnogo tona v sistemakh konversii golosa. In: Proceedings of the 11th International Scientific and Technical Conference “Nauka – obrazovaniyu, proizvodstvu, ekonomike, vol. 1, pp. 203–204. BNTU, Minsk (2013). in Russian

    Google Scholar 

  16. Savchenko, V.V., Savchenko, L.V.: J. Commun. Technol. Electron. 68(7), 757–764 (2023). https://doi.org/10.1134/S1064226923060128

    Article  Google Scholar 

  17. Savchenko, A.V., Savchenko, V.V.: Radioelectron. Commun. Syst. 64(6), 300–309 (2021). https://doi.org/10.3103/S0735272721060030

    Article  Google Scholar 

  18. Gibson, J.: Information 10(5), 179–189 (2019). https://doi.org/10.3390/info10050179

    Article  Google Scholar 

  19. Herbst, Ch T., Elemans, C.P.H., Tokuda, I.T., Chatziioannou, V., Švec, J.G.: J. Voice (2023). https://doi.org/10.1016/j.jvoice.2022.10.004

    Article  Google Scholar 

  20. Sadok, S., Leglaive, S., Girin, L., Alameda-Pineda, X., Séguier, R.: Speech. Commun. 148, 53–65 (2023). https://doi.org/10.1016/j.specom.2023.02.005

    Article  Google Scholar 

  21. Savchenko, V.V.: J. Commun. Technol. Electron. 64(6), 590–596 (2019). https://doi.org/10.1134/S0033849419060093

    Article  Google Scholar 

  22. Kullback, S.: Information theory and statistics. Dover, New York (1997)

    Google Scholar 

  23. Savchenko, V.V.: Meas. Tech. 66(6), 430–438 (2023). https://doi.org/10.1007/s11018-023-02244-3

    Article  Google Scholar 

  24. Marple Jr., S.L.: Digital spectral analysis, 2nd edn. Dover, New York (2019)

    Google Scholar 

  25. Savchenko, V.V.: Meas. Tech. 66(3), 203–210 (2023). https://doi.org/10.1007/s11018-023-02211-y

    Article  Google Scholar 

  26. Oppenheim, A., Schafer, R.: Discrete-time signal processing, 3rd edn. Pearson (2009)

    Google Scholar 

  27. Kathiresan, Th , Maurer, D., Suter, H., Dellwo, V.: J. Acoust. Soc. Am. 143(3), 1919–1920 (2018). https://doi.org/10.1121/1.5036258

    Article  ADS  Google Scholar 

  28. Kovela, S., Valle, R., Dantrey, A., Catanzaro, B.: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Rhodes Island., pp. 1–5 (2023). https://doi.org/10.1109/ICASSP49357.2023.10096220

    Book  Google Scholar 

  29. Sun, P., Mahdi, A., Xu, J., Qin, J.: Speech. Commun. 101, 57–69 (2018). https://doi.org/10.1016/j.specom.2018.05.006

    Article  Google Scholar 

  30. Tohyama, M.: Spectral envelope and source signature analysis. In: Acoustic signals and hearing, pp. 89–110. Academic Press, (2020) https://doi.org/10.1016/B978-0-12-816391-7.00013-9

    Chapter  Google Scholar 

  31. Savchenko, V.V.: Radioelectron. Commun. Syst. 63, 42–54 (2020). https://doi.org/10.3103/S0735272720010045

    Article  Google Scholar 

  32. Eggermont, J.J.: Brain responses to auditory mismatch and novelty detection. Academic Press, pp. 345–376 (2023). https://doi.org/10.1016/B978-0-443-15548-2.00011-9

    Book  Google Scholar 

  33. Oganian, Y., Bhaya-Grossman, I., Johnson, K., Chang, E.: Neuron 111(13), 2105–2118e4 (2023). https://doi.org/10.1016/j.neuron.2023.04.004

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. V. Savchenko.

Ethics declarations

Conflict of interest

The author declares no conflict of interest.

Additional information

Translated from Izmeritel’naya Tekhnika, No. 10, pp. 63–69, October, 2023. Russian DOI: https://doi.org/10.32446/0368-1025it.2023-10-63-69.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Original article submitted September 18, 2023; approved after reviewing October 18, 2023; accepted for publication October 18, 2023.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Savchenko, V.V. A measure of differences in speech signals by the voice timbre. Meas Tech 66, 803–812 (2024). https://doi.org/10.1007/s11018-024-02294-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11018-024-02294-1

Keywords

UDC

Navigation