Cluster Computing

, Volume 22, Supplement 4, pp 7873–7884 | Cite as

Research on depression detection algorithm combine acoustic rhythm with sparse face recognition

  • Jian Zhao
  • Weiwen Su
  • Jian JiaEmail author
  • Chao Zhang
  • Tingting Lu


Due to the existence of false positive rate of the traditional depression diagnosis method, this paper proposes a multi-modal fusion algorithm based on speech signal and facial image sequence for depression diagnosis. Introduced spectrum subtraction to enhance depressed speech signal, and use cepstrum method to extract pitch frequency features with large variation rate and formant features with significant difference, the short time energy and Mel-frequency cepstral coefficients characteristic parameters for different emotion speeches are analyzed in both time domain and frequency domain, and establish a model for training and identification. Meanwhile, this paper implements the orthogonal match pursuit algorithm to obtain a sparse linear combination of face test samples, and cascade with voice and facial emotions based proportion. The experimental results show that the recognition rate based on the depression detection algorithm of fusion speech and facial emotions has reached 81.14%. Compared to the existing doctor’s accuracy rate of 47.3%, the accuracy can bring extra 71.54% improvement by combining with the proposed method of this paper. Additionally, it can easily apply to the hardware and software on the existing hospital instruments with low cost. Therefore, it is an accurate and effective method for diagnosing depression.


Depression diagnosis Pitch frequency Sparse representation Speech signal Facial emotions Multimodal fusion 



This work was supported by National Natural Science Foundation of China (No. 61379010).


  1. 1.
    Ionescu, D.F., et al.: Defining anxious depression: a review of the literature. CNS Spectr. 18(5), 252–260 (2013)CrossRefGoogle Scholar
  2. 2.
    Erschens, R., et al.: Methodological aspects of international research on the burden of anxiety and depression in medical students. Ment. Health Prev. 4(1), 31–35 (2016)CrossRefGoogle Scholar
  3. 3.
    Melton, T.H., et al.: Comorbid anxiety and depressive symptoms in children and adolescents: a systematic review and analysis. J. Psychiatr. Pract. 22(2), 84 (2016)CrossRefGoogle Scholar
  4. 4.
    Potapova, R., Grigorieva, M.: Crosslinguistic intelligibility of Russian and German speech in noisy environment. J. Electr. Comput. Eng. 2017, 1–9 (2017)CrossRefGoogle Scholar
  5. 5.
    Vrbova, K., et al.: Quality of life, self-stigma, and hope in schizophrenia spectrum disorders: a cross-sectional study. Neuropsychiatr. Dis. Treat. 13, 567 (2017)CrossRefGoogle Scholar
  6. 6.
    Hernández-Mena, C.D., Meza-Ruiz, I.V., Herrera-Camacho, J.A.: Automatic speech recognizers for Mexican Spanish and its open resources. J. Appl. Res. Technol. 15(3) (2017)Google Scholar
  7. 7.
    Huang, Y.B., et al.: Hash authentication algorithm of compressed domain speech perception based on MFCC and NMF. Appl. Mech. Mater. 719–720, 1166–1170 (2015)CrossRefGoogle Scholar
  8. 8.
    Yang, A.Y., et al.: Distributed sensor perception via sparse representation. Proc. IEEE 98(6), 1077–1088 (2010)CrossRefGoogle Scholar
  9. 9.
    Maas, A.L., et al.: Building DNN acoustic models for large vocabulary speech recognition. Comput. Speech Lang. 41(C), 195–213 (2017)Google Scholar
  10. 10.
    Ozdas, A., et al.: Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. IEEE Trans. Biomed. Eng. 51(9), 1530–1540 (2004)CrossRefGoogle Scholar
  11. 11.
    Vihari, S., et al.: Comparison of speech enhancement algorithms. Procedia Comput. Sci. 89, 666–676 (2016)CrossRefGoogle Scholar
  12. 12.
    Liu, Y.H., Zhou, D.M., Jiang, Z.J.: Improved spectral subtraction speech enhancement algorithm. Adv. Mater. Res. 760–762, 536–541 (2013)CrossRefGoogle Scholar
  13. 13.
    Tohidypour, H.R., Ahadi, S.M.: New features for speech enhancement using bivariate shrinkage based on redundant wavelet filter-banks. Comput. Speech Lang. 35(C), 93–115 (2016)Google Scholar
  14. 14.
    You, C.H., Bin, M.A.: Spectral-domain speech enhancement for speech recognition. Speech Commun. 94, 30–41 (2017)CrossRefGoogle Scholar
  15. 15.
    Sahu, S., Espywilson, C.: Effects of depression on speech. J. Acoust. Soc. Am. 136(4), 2312–2312 (2014)CrossRefGoogle Scholar
  16. 16.
    ChinnaRao, M., Murthy, A.V.S.N., Satyanarayana, Ch.: Emotion recognition system based on skew gaussian mixture model and MFCC coefficients. Int. J. Inf. Eng. Electron. Bus. (IJIEEB) 4, 51–57 (2015)Google Scholar
  17. 17.
    Yang, Y., Fairbairn, C., Cohn, J.F.: Detecting depression severity from vocal prosody. IEEE Trans. Affect. Comput. 4(2), 142–150 (2013)CrossRefGoogle Scholar
  18. 18.
    Schuller, B., et al.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. 53(9–10), 1062–1087 (2011)CrossRefGoogle Scholar
  19. 19.
    Laukkanen, A.M., Björkner, E., Sundberg, J.: Throaty voice quality: subglottal pressure, voice source, and formant characteristics. J. Voice 20(1), 25–37 (2006)CrossRefGoogle Scholar
  20. 20.
    Hou, L.M., Xiao-Ning, H.U., Xie, J.M.: Application of formant instantaneous characteristics to speech recognition and speaker identification. J. Shanghai Univ. (English Edition) 15(2), 123–127 (2011)CrossRefGoogle Scholar
  21. 21.
    Vijayan, K., Reddy, P.R., Murty, K.S.R.: Significance of analytic phase of speech signals in speaker verification. Speech Commun. 81, 54–71 (2016)CrossRefGoogle Scholar
  22. 22.
    He, L., Guo, L.H., Li, H.Z.: Emotion speech recognition under sadness conditions. Adv. Mater. Res. 488–489, 1329–1334 (2012)CrossRefGoogle Scholar
  23. 23.
    Jian, Z., et al.: A fast iterative pursuit algorithm in robust face recognition based on sparse representation. Math. Probl. Eng. 2, 1–11 (2014)Google Scholar
  24. 24.
    Yin, A.H., Jiang, H.M., Zhang, Q.M.: Application of improved OMP algorithm in face recognition. Comput. Eng. 38(12), 275–278 (2012)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  • Jian Zhao
    • 1
  • Weiwen Su
    • 1
  • Jian Jia
    • 2
    Email author
  • Chao Zhang
    • 1
  • Tingting Lu
    • 1
  1. 1.School of Information Science and TechnologyNorthwest UniversityXi’anChina
  2. 2.School of MathematicsNorthwest UniversityXi’anChina

Personalised recommendations