Abstract
This research relates to the field of speech technologies, where the key issue is the optimization of speech signal processing under conditions of a prior uncertainty of its fine structure. The problem of automatic (objective) analysis of the speaker’s voice timbre using a speech signal of finite duration is considered. It is proposed to use a universal information-theoretic approach to solve it. Based on the Kullback-Leibler divergence, an expression was obtained to describe the asymptotically optimal decision statistic for differentiating speech signals by the voice timbre. The author highlights a serious obstacle during practical implementation of such statistics, namely: synchronization of the sequence of observations with the pitch of speech signals. To overcome the described obstacle, an objective measure of timbre-based differences in speech signals is proposed in terms of the acoustic theory of speech production and its “acoustic tube” type model of the speaker’s vocal tract. The possibilities of practical implementation of a new measure based on an adaptive recursive filter are considered. A full-scale experiment was set up and carried out. The experimental results confirmed two main properties of the proposed measure: high sensitivity to differences in speech signals in terms of voice timbre and invariance with respect to the fundamental pitch frequency. The obtained results can be used when designing and studying digital speech processing systems tuned to the speaker’s voice, for example, digital voice communication systems, biometric and biomedical systems, etc.
Similar content being viewed by others
Notes
The assumption of a Gaussian probability distribution does not limit the generality of the conclusions of this study, as this law is characterized by the maximum entropy for a given average power of the speech signal.
COSH—cosine hyperbolic function.
Researchers often prefer Berg’s method over other parametric spectral analysis methods due to its well-known advantages in terms of computational speed and, most importantly, stability of the spectral estimates of the autoregressive type that are formed on its basis.
The Phoneme Training phonetic analysis and speech training information system: [website]. URL: https://sites.google.com/site/frompldcreators/produkty-1/phonemetraining (access date: May 18, 2023).
This order is intended for autoregressive simulation of 4–5 AFC resonances of a typical vocal tract when pronouncing vowels in the frequency bandwidth of 0 to 4 kHz.
References
Zhao, R., Erleke, E., Wang, L., Huang, J., Chen, Z.: The effects of timbre on voice interaction. In: Rau, P.-L.P. (ed.) Cross-Cultural Design: HCII 2023, Lecture Notes in Computer Science, vol. 14023. Springer, Cham (2023) https://doi.org/10.1007/978-3-031-35939-2_12
Ando, Y.: Temporal and spatial features of speech signals. In: Signal processing in auditory neuroscience, pp. 81–101. Academic Press, (2019) https://doi.org/10.1016/B978-0-12-815938-5.00009-1
Ternström, S.: Appl. Sci. 13(6), 3514 (2023). https://doi.org/10.3390/app13063514
Song, W., Yue, Y., Zhang, Y., et al.: Multi-speaker multistyle speech synthesis with timbre and style disentanglement. In: Zhenhua, L., Jianqing, G., Kai, Y., Jia, J. (eds.) Man-machine speech communication: NCMMSC 2022, communications in computer and information science. Springer, Singapore (2022) https://doi.org/10.1007/978-981-99-2401-1_12
Jialu, L., Hasegawa-Johnson, M., McElwain, N.L.: Speech. Commun. 133, 41–61 (2021). https://doi.org/10.1016/j.specom.2021.07.010
Savchenko, V.V.: Radioelectron. Commun. Syst. 64(11), 592–603 (2021). https://doi.org/10.3103/S0735272721110030
Savchenko, A.V., Savchenko, V.V.: Meas. Tech. 64(4), 928–935 (2022). https://doi.org/10.1007/s11018-022-02025-4
Wei, Y., Gan, L., Huang, X.: Front. Psychol. 13, 869475 (2022). https://doi.org/10.3389/fpsyg.2022.869475
Xue, J., Zhou, H., Song, H., Wu, B., Shi, L.: Speech. Commun. 147, 41–50 (2023). https://doi.org/10.1016/j.specom.2023.01.001
Li, J., Zhang, L., Qiu, Z.: 5th International Conference on Intelligent Control, Measurement and Signal Processing (ICMSP). Chengdu., pp. 833–837 (2023). https://doi.org/10.1109/ICMSP58539.2023.10171030
Igras-Cybulska, M., Hekiert, D., Cybulski, A., et al.: Work-in-Progress. In: 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW) Shanghai. pp. 355–359. (2023) https://doi.org/10.1109/VRW58643.2023.00079
Cui, S., Li, E., Kang, X.: 2020 IEEE International Conference on Multimedia and Expo (ICME). London., pp. 1–6 (2020). https://doi.org/10.1109/ICME46284.2020.9102765
Gupta, S., Fahad, M.S., Deepak, A.: Multimed Tools Appl 79, 23347–23365 (2020). https://doi.org/10.1007/s11042-020-09068-1
Dai, B., Zahorian, S.: J. Acoust. Soc. Am. 104, 1805 (1998). https://doi.org/10.1121/1.423591
Zakhar’ev, V.A., Petrovskii, A.A.: Metody parametrizatsii rechevogo signala na osnove analiza, sinkhronizirovannogo s chastotoi osnovnogo tona v sistemakh konversii golosa. In: Proceedings of the 11th International Scientific and Technical Conference “Nauka – obrazovaniyu, proizvodstvu, ekonomike, vol. 1, pp. 203–204. BNTU, Minsk (2013). in Russian
Savchenko, V.V., Savchenko, L.V.: J. Commun. Technol. Electron. 68(7), 757–764 (2023). https://doi.org/10.1134/S1064226923060128
Savchenko, A.V., Savchenko, V.V.: Radioelectron. Commun. Syst. 64(6), 300–309 (2021). https://doi.org/10.3103/S0735272721060030
Gibson, J.: Information 10(5), 179–189 (2019). https://doi.org/10.3390/info10050179
Herbst, Ch T., Elemans, C.P.H., Tokuda, I.T., Chatziioannou, V., Švec, J.G.: J. Voice (2023). https://doi.org/10.1016/j.jvoice.2022.10.004
Sadok, S., Leglaive, S., Girin, L., Alameda-Pineda, X., Séguier, R.: Speech. Commun. 148, 53–65 (2023). https://doi.org/10.1016/j.specom.2023.02.005
Savchenko, V.V.: J. Commun. Technol. Electron. 64(6), 590–596 (2019). https://doi.org/10.1134/S0033849419060093
Kullback, S.: Information theory and statistics. Dover, New York (1997)
Savchenko, V.V.: Meas. Tech. 66(6), 430–438 (2023). https://doi.org/10.1007/s11018-023-02244-3
Marple Jr., S.L.: Digital spectral analysis, 2nd edn. Dover, New York (2019)
Savchenko, V.V.: Meas. Tech. 66(3), 203–210 (2023). https://doi.org/10.1007/s11018-023-02211-y
Oppenheim, A., Schafer, R.: Discrete-time signal processing, 3rd edn. Pearson (2009)
Kathiresan, Th , Maurer, D., Suter, H., Dellwo, V.: J. Acoust. Soc. Am. 143(3), 1919–1920 (2018). https://doi.org/10.1121/1.5036258
Kovela, S., Valle, R., Dantrey, A., Catanzaro, B.: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Rhodes Island., pp. 1–5 (2023). https://doi.org/10.1109/ICASSP49357.2023.10096220
Sun, P., Mahdi, A., Xu, J., Qin, J.: Speech. Commun. 101, 57–69 (2018). https://doi.org/10.1016/j.specom.2018.05.006
Tohyama, M.: Spectral envelope and source signature analysis. In: Acoustic signals and hearing, pp. 89–110. Academic Press, (2020) https://doi.org/10.1016/B978-0-12-816391-7.00013-9
Savchenko, V.V.: Radioelectron. Commun. Syst. 63, 42–54 (2020). https://doi.org/10.3103/S0735272720010045
Eggermont, J.J.: Brain responses to auditory mismatch and novelty detection. Academic Press, pp. 345–376 (2023). https://doi.org/10.1016/B978-0-443-15548-2.00011-9
Oganian, Y., Bhaya-Grossman, I., Johnson, K., Chang, E.: Neuron 111(13), 2105–2118e4 (2023). https://doi.org/10.1016/j.neuron.2023.04.004
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares no conflict of interest.
Additional information
Translated from Izmeritel’naya Tekhnika, No. 10, pp. 63–69, October, 2023. Russian DOI: https://doi.org/10.32446/0368-1025it.2023-10-63-69.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Original article submitted September 18, 2023; approved after reviewing October 18, 2023; accepted for publication October 18, 2023.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Savchenko, V.V. A measure of differences in speech signals by the voice timbre. Meas Tech 66, 803–812 (2024). https://doi.org/10.1007/s11018-024-02294-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11018-024-02294-1
Keywords
- Acoustic measurements
- Speech acoustics
- Speech signal
- (fundamental) pitch
- Vocal tract
- All-pole model
- Berg’s method