Pathophysiological Voice Analysis for Diagnosis and Monitoring of Depression

  • Shinichi TokunoEmail author


Self-assessment questionnaires are commonly used for screening for stress and depression. However, there are problems of reporting bias that respondents underestimate or overestimate consciously or unconsciously. On the other hand, various biomarkers of depression and stress are being studied. These are often necessary for expensive equipment and chemicals and are useful for definitive diagnosis and elucidation of mechanisms, but they are not suitable for screening for many populations.

It is a known fact that various diseases change the voice. The relationship between disease and voice has been studied in the field of acoustic phonetics since long ago. They have been studied mainly in the frequency band (F1, F2, etc.) which are obtained by Cepstrum analysis of voice. They are influenced by the shape of the vocal tract called the formant (the cavity from the vocal cord to the mouth). On the other hand, studies using the fundamental frequency (F0) which is obtained as a lowest frequency by FFT also have been reported. F0 is affected vocal cord vibration, and currently there are various methods of F0 analysis. F0 contains a lot of involuntary components compared to the formant. Therefore, analysis of F0 is potentially available to diagnose various diseases. Now, the range of adaptation of voice analysis has expanded from the otolaryngology area to psychiatric areas such as depression and neurological diseases such as Parkinson’s disease. In addition, research such as differential diagnosis by voice and measurement of therapeutic effect has started.

Such developments are largely due to the development of computers, especially the spread of smartphones. In other words, voice collection and analysis became possible in everyday life. For example, several smartphone applications that measure stress and depression by analyzing everyday speech have been published. In Japan, the movement to utilize such applications in the fields of healthcare and industrial medicine is becoming active. Our group has already developed Mind Monitoring System using smartphone and operates that system in Japan. This system is based on the emotion recognition technology instead of directly voice analysis. Pathophysiological analysis by voice is noninvasive, remote and continuously, without requiring special equipment. Therefore, this technique is effective as screening for many subjects and long-term continuous monitoring at home. This means that this technology can be a bridge between healthcare and medical treatment. In clinical, it is also possible to give objective indicators to medical areas that had only subjective indicators.


Voice analysis Fundamental frequency Emotion recognition 



I appreciate Shunji Mitsuyoshi, Shuji Shinohara, Mitsuteru Nakamura, Masakazu Higuchi, Yasuhiro Omiya, and Naoki Hagiwara. They are my team and each working on research with original ideas and outstanding skills.


  1. Beck AT. A systematic investigation of depression. Compr Psychiatry. 1961;2(3):163–70.CrossRefGoogle Scholar
  2. Beck AT, Ward CH, Mendelson M, Mock J, Erbaugh J. An inventory for measuring depression. Arch Gen Psychiatry. 1961;4(6):561–71.CrossRefGoogle Scholar
  3. Burkhardt F, Sendlmeier WF. Verification of acoustical correlates of emotional speech using formant-synthesis. In: ISCA Tutorial and Research Workshop (ITRW) on speech and emotion, 2000.Google Scholar
  4. Cahn JE. The generation of affect in synthesized speech. J Am Voice I/O Soc. 1990;8:1–19.Google Scholar
  5. Cobb S, Lindemann E. Neuropsychiatric observations (in a symposium on the management of the cocoanut grove burns at the Massachusetts General Hospital). Ann Surg. 1943;117(2):814.CrossRefGoogle Scholar
  6. Cummins N, Epps J, Breakspear M, Goecke R. An investigation of depressed speech detection: features and normalization. In: Interspeech, 2011. P. 2997–3000.Google Scholar
  7. Darby JK, editor. Speech evaluation in psychiatry. New York: Grune and Stratton; 1981.Google Scholar
  8. Darby JK, Hollien H. Vocal and speech patterns of depressive patients. Folia Phoniatr. 1977;2(9):279–91.CrossRefGoogle Scholar
  9. Darby JK, Simmons N, Berger P. Speech and voice parameters in depression a: pilot study. J Commun Disord. 1984;17:87–94.CrossRefGoogle Scholar
  10. Eyben F, Wöllmer M, Schuller B. Opensmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM international conference on multimedia, ACM, Oct 2010. P. 1459–62.Google Scholar
  11. Flint AJ, Black SE, Campbell-Taylor I, Gailey GF, Levinton C. Abnormal speech articulation, psychomotor retardation, and subcortical dysfunction in major depression. J Psychiatr Res. 1993;27(3):309–19.CrossRefGoogle Scholar
  12. Goldberg DP, Blackwell B. Psychiatric illness in general practice: a detailed study using a new method of case identification. BMJ. 1970;2(5707):439–43.CrossRefGoogle Scholar
  13. Hagiwara N, Omiya Y, Shinohara S, Nakamura M, Kogure U, Mitsuyoshi S, Tokuno S. Effectiveness verification by the difference of the recording method in the monitoring system of the mental health state by voice using the smartphone [Japanese]. In: Japan Biomedical Engineering Symposium 2016 (JBEMS 2016), Asahikawa, Sept 2016a.Google Scholar
  14. Hagiwara N, Omiya Y, Shinohara S, Nakamura M, Yasunaga H, Mitsuyoshi S, Tokuno S. Validity of the mind monitoring system as a mental health indicator. In: 2016 IEEE 16th international conference on Bioinformatics and Bioengineering (BIBE), Taichung, Oct 2016b. P. 262–5.Google Scholar
  15. Hargreaves W, Starkweather J, Blacker K. Voice quality in depression. J Abnorm Psychol. 1965;70:218–20.CrossRefGoogle Scholar
  16. Hoge CW, Castro CA, Messer SC, McGurk D, Cotting DI, Koffman RL. Combat duty in Iraq and Afghanistan, mental health problems, and barriers to care. N Engl J Med. 2004;351(1):13–22.CrossRefGoogle Scholar
  17. Kessler RC, Andrews G, Colpe LJ, Hiripi E, Mroczek DK, Normand SL, Walters EE, Zaslavsky AM. Short screening scales to monitor population prevalences and trends in non-specific psychological distress. Psychol Med. 2002;32(6):959–76.CrossRefGoogle Scholar
  18. Low LSA, Maddage NC, Lech M, Sheeber LB, Allen NB. Detection of clinical depression in adolescents’ speech during family interactions. IEEE Trans Biomed Eng. 2011;58(3):574–86.CrossRefGoogle Scholar
  19. Maxhuni A, Muñoz-Meléndez A, Osmani V, Perez H, Mayora O, Morales EF. Classification of bipolar disorder episodes based on analysis of voice and motor activity of patients. Pervasive Mob Comput. 2016;31:50–66.CrossRefGoogle Scholar
  20. McLay RN, Deal WE, Murphy JA, Center KB, Kolkow TT, Grieger TA. On-the-record screenings versus anonymous surveys in reporting PTSD. Am J Psychiatry. 2008;165(6):775–6.CrossRefGoogle Scholar
  21. Mitsuyoshi S. Emotion recognizing method, sensibility creating method, device, and software. WO0223524, Mar 2002.Google Scholar
  22. Mitsuyoshi S. Development of verbal analysis pathophysiology. Econophys Sociophys Other Multidiscip Sci J. 2015;5(1):11–6.Google Scholar
  23. Mitsuyoshi S. Development of voice pathophysiology analysis technology: joint symposium with IT companies. In: 7th Asia Pacific regional conference of the International Association for Suicide Prevention, Tokyo, May 2016.Google Scholar
  24. Mitsuyoshi S, Ren F, Tanaka Y, Kuroiwa S. Non-verbal voice emotion analysis system. Int J Innov Comput Inf Control. 2006;2(4):819–30.Google Scholar
  25. Mitsuyoshi S, Tanaka Y, Ren F, Shibasaki K, Kato M, Murata T, Minami T, Yagura H. Emotion voice analysis system connected to the human brain. In: IEEE NLP-KE 2007, 2007. P. 479–84.Google Scholar
  26. Mitsuyoshi S, Monnma F, Tanaka Y, Minami T, Kato M, Murata T. Identifying neural components of emotion in free conversation with fMRI. In: Defense Science Research conference and expo (DSR) 2011, IEEE, Singapore, Aug 2011. P. 1–4.Google Scholar
  27. Miyazaki K. Verbal analysis of pathophysiology in stress resilience program: joint symposium with IT companies. In: 7th Asia Pacific regional conference of the International Association for Suicide Prevention, Tokyo, May 2016.Google Scholar
  28. Moses PJ. The voice of neurosis. New York: Grune and Stratton; 1954.Google Scholar
  29. Mundt JC, Greist JH, Gelenberg AJ, Katzelnick DJ, Jefferson JW, Model JG. Feasibility and validation of a computer-automated Columbia-suicide severity rating scale using interactive voice response technology. J Psychiatr Res. 2010;44(16):1224–8.CrossRefGoogle Scholar
  30. Mundt JC, Greist JH, Jefferson JW, Federico M, Mann JJ, Posner K. Prediction of suicidal behavior in clinical research by lifetime suicidal ideation and behavior ascertained by the electronic Columbia-suicide severity rating scale. J Clin Psychiatry. 2013;74(9):887–93.CrossRefGoogle Scholar
  31. Murray IR, Arnott JL. Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Comm. 1995;16(4):369–90.CrossRefGoogle Scholar
  32. Nakamura M, Shinohara S, Omiya Y, Mitsuyoshi S, Takagi H, Ushiwatari A, Tokuno S. Correlation between self-administered psychological test and emotion measured by voice analysis. In: International Conference on Information Science and Management Engineering 1 (ICISME 2015), Phuket, Dec 2015.Google Scholar
  33. Newman SS, Mather VG. Analysis of spoken language of patients with affective disorders. Am J Psychiatry. 1938;94:912–42.CrossRefGoogle Scholar
  34. Nilsonne Å, Sundberg J, Ternström S, Askenfelt A. Measuring the rate of change of voice fundamental frequency in fluent speech during mental depression. J Acoust Soc Am. 1988;83(2):716–28.CrossRefGoogle Scholar
  35. Omiya Y. Development of the Mind Monitoring System (MIMOSYS) which can be able to monitor mental health status using call voice with a smartphone: joint symposium with IT companies. In: 7th Asia Pacific regional conference of the International Association for Suicide Prevention, Tokyo, May 2016.Google Scholar
  36. Omiya Y, Hagiwara N, Shinohara S, Nakamura M, Mitsuyoshi S, Tokuno S. Development of mind monitoring system using call voice. In: Neuroscience 2016, San Diego, Nov 2016.Google Scholar
  37. Perrin M, DiGrande L, Wheeler K, Thorpe L, Farfel M, Brackbill R. Differences in PTSD prevalence and associated risk factors among World Trade Center disaster rescue and recovery workers. Am J Psychiatr. 2007;164(9):1385–94.CrossRefGoogle Scholar
  38. Radloff LS. The CES-D scale: a self report depression scale for research in the general population. Appl Psychol Measur. 1977;1:385–401.CrossRefGoogle Scholar
  39. Scherer KR. Vocal assessment of affective disorders. In: Maser JD, editor. Depression and expressive behavior. Hillsdale: Lawrence Erlbaum Associates; 1987. p. 57–82.Google Scholar
  40. Shinohara S, Mitsuyoshi S, Nakamura M, Omiya Y, Tsumatori G, Tokuno S. Validity of a voice-based evaluation method for effectiveness of behavioural therapy. In: Pervasive computing paradigms for mental health. Cham: Springer; 2015. p. 43–51.Google Scholar
  41. Shinohara S, Omiya Y, Nakamura M, Hagiwara N, Mitsuyoshi S, Tokuno S. Voice disability index using pitch rate. In: 2016 IEEE EMBS conference on Biomedical Engineering and Sciences (IECBES), IEEE, Kuala Lumpur, Dec 2016. P. 557–60.Google Scholar
  42. Suzuki G, Tokuno S, Nibuya M, Ishida T, Yamamoto T, Mukai Y, Mitani K, Tsumatori G, Scott D, Shimizu K. Decreased plasma brain-derived neurotrophic factor and vascular endothelial growth factor concentrations during military training. PLoS One. 2014;9(2):e89455.CrossRefGoogle Scholar
  43. Szabadi E, Bradshaw CM, Besson JAO. Elongation of pause-time in speech: a simple, objective measure of motor retardation in depression. Br J Psychiatry. 1976;129:592–7.CrossRefGoogle Scholar
  44. Tokuno S. Stress evaluation by voice: from prevention to treatment in mental health care. Econophys Sociophys Other Multidiscip Sci J. 2015a;5(1):30–5.Google Scholar
  45. Tokuno S. Medical evidence of voice pathophysiology analysis technology: joint symposium with IT companies. In: 7th Asia Pacific regional conference of the International Association for Suicide Prevention, Tokyo, May 2015b.Google Scholar
  46. Tokuno S. Verbal analysis of pathophysiology [Japanese]. Saibou. 2016;48(14):9–12.Google Scholar
  47. Tokuno S, Tsumatori G, Shono S, Takei E, Suzuki G, Yamamoto T, Shimura M. Usage of emotion recognition in military health care. In: Defense Science Research conference and expo (DSR) 2011, IEEE, Singapore. P. 1–4.Google Scholar
  48. Tokuno S, Shimozono S, Tsumatori G. Usage of emotion recognition in stress resilience program. In: 40th WCMM (World Congress in Military Medicine), Saudi Arabia, Dec 2013.Google Scholar
  49. Tokuno S, Mitsuyoshi S, Suzuki G, Tsumatori G. Stress evaluation using voice emotion recognition technology: a novel stress evaluation technology for disaster responders. In: XVI World Congress of Psychiatry, Madrid, Sept 2014.Google Scholar
  50. Tokuno S, Omiya Y, Shinohara S, Nakamura M, Hagiwara N, Mitsuyoshi S. Psychological impact of Kumamoto earthquake by voice analysis using a smart phone application. In: Neuroscience 2016, San Diego, Nov 2016.Google Scholar
  51. Tolkmitt F, Helfrich H, Standke R, Scherer KR. Vocal indicators of psychiatric treatment effects in depressive and schizophrenics. J Commun Disord. 1982;15:209–22.CrossRefGoogle Scholar
  52. Weintraub W, Aronson H. The application of verbal behavior analysis to the study of psychological defense mechanisms: IV. Speech patterns associated with depressive behavior. J Nerv Ment Disord. 1967;144:22–8.CrossRefGoogle Scholar
  53. Weiss DS. The impact of event scale-revised. In: Wilson JP, Keane TM, editors. Assessing psychological trauma and PTSD. 2nd ed. New York: Guilford Press; 2004. p. 168–89.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Department of Verbal Analysis of Pathophysiology, Graduate School of MedicineThe University of TokyoTokyoJapan

Personalised recommendations