A radius-incorporated localized multiple kernel learning algorithm for detecting depression in speech

Jiang, Haihua; Hu, Bin; Liu, Zhenyu; Wang, Gang; Zhang, Lan

doi:10.1007/s10772-023-10017-0

A radius-incorporated localized multiple kernel learning algorithm for detecting depression in speech

Published: 23 January 2023

Volume 26, pages 371–378, (2023)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Haihua Jiang ORCID: orcid.org/0000-0002-4109-4061^1,2,
Bin Hu^2,3,
Zhenyu Liu³,
Gang Wang⁴ &
…
Lan Zhang⁵

154 Accesses
Explore all metrics

Abstract

Early intervention for depression could provide a means to reducing the disease burden, but there is a lack of objective diagnostic methods. This study investigated automatic depression classification on a speech dataset of 85 healthy controls (51 females and 34 males) and 85 depressed patients (53 females and 32 males). Considering that there are obvious differences in the performance of different types of speech features, we propose a radius-incorporated localized multiple kernel learning (trLMKL) algorithm for detecting depression in speech to make the best use of speech features. To improve the classification accuracy, we combine the information of both the margin and the radius of the MEB to learn the gating model parameters in our algorithm. Furthermore, we do not directly incorporate the radius of the MEB, but incorporate the trace of the total scattering matrix of training data. This method can avoid the time cost of calculating the radius at each iteration and decrease the computational complexity. Comprehensive experiments were carried out on our depressed speech dataset and 10 UCI datasets. Our algorithm achieved better classification performance overall than SimpleMKL and LMKL, and it was efficient at detecting depression, indicating its potential for use as a diagnostic method for depression.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Automatic speech recognition: a survey

Article 10 November 2020

A Deep Learning Framework for Audio Deepfake Detection

Article 08 November 2021

Data availability

The data used to support the findings of this study are available from the corresponding author upon request.

References

Airas, M. (2008). TKK Aparat: An environment for voice inverse filtering and parameterization. Logopedics Phoniatrics Vocology, 33, 49–64.
Article Google Scholar
Alghowinem, S., Goecke, R., Wagner, M., Epps, J., Gedeon, T., Breakspear, M., & Parker, G. (2013). A comparative study of different classifiers for detecting depression from spontaneous speech. In Proceedings of ICASSP 2013, (pp. 8022–8026). IEEE
Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple parameters for support vector machines. Machine Learning, 46, 31–159.
Article MATH Google Scholar
Chen, J., & Liu, Y. (2011). Locally linear embedding: A survey. Artificial Intelligence Review, 36, 29–48.
Article Google Scholar
Chung, K. M., Kao, W. C., Sun, C. L., Wang, L. L., & Lin, C. J. (2003). Radius margin bounds for support vector machines with the RBF kernel. Neural Computation, 15, 2643–2681.
Article MATH Google Scholar
Cummins, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J., & Quatieri, T. F. (2015). A review of depression and suicide risk assessment using speech analysis. Speech Communication, 71, 10–49.
Article Google Scholar
Cummins, N., Epps, J., Sethu, V., & Krajewski, J. (2014). Variability compensation in small data: Oversampled extraction of I-vectors for the classification of depressed speech. In Proceedings of ICASSP 2014, (pp. 970–974). IEEE
Dua, D., & Karra Taniskidou, E. UCI machine learning repository. University of California, School of Information and Computer Science. Retrieved 2021, from http://archive.ics.uci.edu/ml.
Eyben, F., Wöllmer, M., & Schuller, B. (2010). Opensmile-The Munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on multimedia, (pp. 1459–1462). Association for Computing Machinery
Gönen, M., & Alpaydin, E. (2008). Localized multiple kernel learning. In Proceedings of the 5th international conference on machine learning, (pp. 352–359). Springer-Verlag
Gönen, M., & Alpaydın, E. (2013). Localized algorithms for multiple kernel learning. Pattern Recognition, 46, 795–807.
Article MATH Google Scholar
Hawton, K., Comabella, C. C. I., Haw, C., & Saunders, K. (2013). Risk factors for suicide in individuals with depression: A systematic review. Journal of Affective Disorders, 147, 17–28.
Article Google Scholar
He, L., & Cao, C. (2018). Automated depression analysis using convolutional neural networks from speech. Journal of Biomedical Informatics, 83, 103–111.
Article Google Scholar
Hu, M., Chen, Y., & Kwok, J. T. Y. (2009). Building sparse multiple kernel SVM classifiers. IEEE Transactions on Neural Networks, 20, 827–839.
Article Google Scholar
Huang, K. Y., Wu, C. H., Su, M. H., & Kuo, Y. T. (2020). Detecting unipolar and bipolar depressive disorders from elicited speech responses using latent affective structure model. IEEE Transcactions on Affective Computing, 11, 393–404.
Article Google Scholar
Jiang, H. H., Hu, B., Liu, Z. Y., Wang, G., Zhang, L., Li, X. Y., & Kang, H. Y. (2018). Detecting depression using an ensemble logistic regression model based on multiple speech features. Computational and Mathematical Method, 9, 1–9.
MATH Google Scholar
Jiang, H. H., Hu, B., Liu, Z. Y., Yan, L. H., Wang, T. Y., Liu, F., Kang, H. Y., & Li, X. Y. (2017). Investigation of different speech types and emotions for detecting depression using different classifiers. Speech Communication, 90, 39–46.
Article Google Scholar
Liu, X. W., Wang, L., Yin, J. P., Zhu, E., & Zhang, J. (2013). An efficient approach to integrating radius information into multiple kernel learning. IEEE Transactions on Cybernetics., 43, 557–569.
Article Google Scholar
Low, L. A., Maddage, N. C., Lech, M., Sheeber, L. B., & Allen, N. B. (2011). Detection of clinical depression in adolescents’ speech during family interactions. IEEE Transactions on Bio-Medical Engineering, 58, 574–586.
Article Google Scholar
Moore, E., Clements, M., Peifer, J. W., & Weisser, L. (2008). Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Transactions on Bio-Medical Engineering, 55, 96–107.
Article Google Scholar
Nolenhoeksema, S., & Girgus, J. S. (1994). The emergence of gender differences in depression during adolescence. Psychological Bulletin, 115, 424–443.
Article Google Scholar
Ooi, K. E. B., Lech, M., & Allen, N. B. (2014). Prediction of major depression in adolescents using an optimized multi-channel weighted speech classification system. Biomedical Signal Processing, 14, 228–239.
Article Google Scholar
Rakotomamonjy, A., Bach, F., Grandvalet, Y., & Canu, S. (2008). SimpleMKL. Journal of Machine Learning Research, 9, 2491–2521.
MathSciNet MATH Google Scholar
Scherer, S., Stratou, G., Gratch, J., & Morency, L. P. (2013). Investigating voice quality as a speaker-independent indicator of depression and PTSD. In Proceedings of Interspeech, 2013, (pp. 847–851). ISCA
Google Scholar
Sobin, C., & Sackeim, H. A. (1997). Psychomotor symptoms of depression. American Journal of Psychiatry., 154, 4–17.
Article Google Scholar
Wang, L. (2008). Feature selection with kernel class separability. IEEE Transactions on Pattern Analysis, 30, 1534–1546.
Article Google Scholar
World Health Organization. (2021, September 13). Depression fact sheet. WHO, Geneva, Switzerland. Retrieved January 27, 2022, from http://www.who.int/en/news-room/fact-sheets/detail/depression.
Xu, X., Tsang, I. W., & Xu, D. (2013). Soft margin multiple kernel learning. IEEE Transactions on Neural Networks, 24, 749–761.
Article Google Scholar
Xu, Z., Jin, R., Yang, H., King, I., & Lyu, M. R. (2010). Simple and efficient multiple kernel learning by group Lasso. In Proceedings of the 27th international conference on machine learning, (pp. 1175–1182). Omnipress
Zhao, Z., Bao, Z., Zhang, Z., Cummins, N., & Schuller, B. (2020). Hierarchical attention transfer networks for depression assessment from speech. In Proceedings of ICASSP 2020, (pp. 7159–7163). IEEE

Download references

Acknowledgements

This work was supported by the National Basic Research Program of China (973 Program) (No.2014CB744600).

Author information

Authors and Affiliations

School of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou, China
Haihua Jiang
Faculty of Information Technology, Beijing University of Technology, Beijing, China
Haihua Jiang & Bin Hu
School of Information Science and Engineering, Lanzhou University, Lanzhou, China
Bin Hu & Zhenyu Liu
Beijing Anding Hospital of Capital Medical University, Beijing, China
Gang Wang
Lanzhou University Second Hospital, Lanzhou, China
Lan Zhang

Authors

Haihua Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Gang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Haihua Jiang or Bin Hu.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jiang, H., Hu, B., Liu, Z. et al. A radius-incorporated localized multiple kernel learning algorithm for detecting depression in speech. Int J Speech Technol 26, 371–378 (2023). https://doi.org/10.1007/s10772-023-10017-0

Download citation

Received: 27 January 2022
Accepted: 08 January 2023
Published: 23 January 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s10772-023-10017-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A radius-incorporated localized multiple kernel learning algorithm for detecting depression in speech

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

A Deep Learning Framework for Audio Deepfake Detection

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A radius-incorporated localized multiple kernel learning algorithm for detecting depression in speech

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

A Deep Learning Framework for Audio Deepfake Detection

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation