“You’re as Sick as You Sound”: Using Computational Approaches for Modeling Speaker State to Gauge Illness and Recovery

  • Julia Hirschberg
  • Anna Hjalmarsson
  • Noémie Elhadad


Recently, researchers in computer science and engineering have begun to explore the possibility of finding speech-based correlates of various medical ­conditions using automatic, computational methods. If such language cues can be identified and quantified automatically, this information can be used to support diagnosis and treatment of medical conditions in clinical settings and to further ­fundamental research in understanding cognition. This chapter reviews ­computational approaches that explore communicative patterns of patients who suffer from medical conditions such as depression, autism spectrum disorders, schizophrenia, and cancer. There are two main approaches discussed: research that explores features extracted from the acoustic signal and research that focuses on lexical and semantic features. We also present some applied research that uses computational methods to develop assistive technologies. In the final sections we discuss issues related to and the future of this emerging field of research.


Modeling speaker state using computational methods Speech ­processing Medical disabilities Depression Suicide Autism spectrum disorder Schizophrenia Cancer Aphasia Acoustic signals Lexical and semantic features Mapping language cues to medical conditions 


  1. 1.
    H. Ai et al (2006), “Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs,” Interspeech 2006, Pittsburgh.Google Scholar
  2. 2.
    M. Alpert et al (2001), “Reflections of depression in acoustic measures of the patient’s speech,” Journal of Affective Disorders, 66:59–69.CrossRefGoogle Scholar
  3. 3.
    J. Ang et al (2002), “Prosody-based automatic detection of annoyance and frustration in human-computer dialog”, ICSLP 2002, Denver.Google Scholar
  4. 4.
    H. Asperger (1944) (tr. U. Frith (1991), “Autistic psychopathy in childhood,” in U. Frith. Autism and Asperger syndrome. Cambridge University Press. pp. 37–92.Google Scholar
  5. 5.
    E. Bantum and J. Owen (2009), “Evaluating the Validity of Computerized Content Analysis Programs for Identification of Emotional Expression in Cancer Narratives,” Psychological Assessment, 2009, 21(1): 79–88.CrossRefGoogle Scholar
  6. 6.
    Emo-D B. Berlin Emotional Speech Corpus. (
  7. 7.
    D. Bitouk et al. (2009), “Improving Emotion Recognition using Class-Level Spectral Features,” Interspeech 2009, Brighton.Google Scholar
  8. 8.
    C. Baltaxe (1984). “Use of contrastive stress in normal, aphasic, and autistic children,” Journal of Speech and Hearing Research, 27:97–105.Google Scholar
  9. 9.
    A. Batliner et al, (2003) “How to find trouble in communication,” Speech Communication, 40, pp. 117–143.zbMATHCrossRefGoogle Scholar
  10. 10.
    P. Boersma & D. Weenink (2005). PRAAT: Doing phonetics by computer (Version 4.3.14) [Computer program]. Retrieved from
  11. 11.
    F. Burkhardt et al. (2005), “A Database of German Emotional Speech,” Interspeech 2005, Lisbon.Google Scholar
  12. 12.
    J. Diehl et al (2009), “An acoustic analysis of prosody in high-functioning autism”, Applied Psycholinguistics, 30(3).Google Scholar
  13. 13.
    R. el Kaliouby et al. (2006). “An Exploratory Social-Emotional Prosthetic for Autism Spectrum Disorders,” in Body Sensor Networks. 2006. MIT Media Lab.Google Scholar
  14. 14.
    R.B Fink et al (2009). “Evaluating Speech Recognition in a Computerized Naming Program for Aphasia,” American Speech-Language Hearing Association Conference. New Orleans, November.Google Scholar
  15. 15.
    R. B. Fink et al. (2002). “A computer implemented protocol for treatment of naming disorders: Evaluation ofclinician-guided and partially self-guided instruction,” Aphasiology,16(10/11):1061–1086.CrossRefGoogle Scholar
  16. 16.
    B. Elvevaag, P. Foltz, D. Weinberger, and T. Goldberg (2007), “Quantifying Incoherence in Speech: an Automated Methodology and Novel Application to Schizophrenia,” Schizophrenia Research, 93:304–316.CrossRefGoogle Scholar
  17. 17.
    B. Elvevaag, P. Foltz, M Rosenstein, and L. DeLisi (2009), “An automated method to analyze language use in patients with schizophrenia and their first degree-relatives,” Journal of Neurolinguistics.Google Scholar
  18. 18.
    W. Goldfarb et al. (1972), “Speech and language faults in schizophrenic children. Journal of Autism and Childhood Schizophrenia, 2(3):219–233, 1972.MathSciNetCrossRefGoogle Scholar
  19. 19.
    P. Gupta & N. Rajput, (2006), “Two-Stream Emotion Recognition For Call Center Monitoring”, Interspeech 2006, Pittsburgh.Google Scholar
  20. 20.
    Gottschalk, L., Winget, C., & Gleser, G. (1969). Manual of instructions for using the Gottschalk-Gleser content analysis scales: Anxiety, hostility, and social alienation-personal disorganization. Berkeley: University of California Press.Google Scholar
  21. 21.
    K. Graves et al. (2005), “Emotional expression and emotional recognition in breast cancer survivors: A controlled comparison,” Psychology and Health, 20:579–595.CrossRefGoogle Scholar
  22. 22.
    M. E. Hoque et al. (2009), “Exploring Speech Therapy Games with Children on the Autism Spectrum,” Interspeech 2009, Brighton.Google Scholar
  23. 23.
    T. Johnstone et al (2006), “The voice of emotion: an FMRI study of neural responses to angry and happy vocal expressions,” Social, Cognitive and Affective Neuroscience, 1(3), 242–249.MathSciNetCrossRefGoogle Scholar
  24. 24.
    L. Kanner (1946), “Irrelevant and metaphorical language in early infantile autism,” American Journal of Psychiatry, 103:242–246.Google Scholar
  25. 25.
    L. Kanner (1948), “Autistic Disturbances of Affective Contact,” Nervous Child, 2:217–2520.Google Scholar
  26. 26.
    C. M. Lee and S. Narayanan (2004), “Towards detecting emotionsin spoken dialogs,” IEEE Transactions on Speech and Audio Processing, 2004.Google Scholar
  27. 27.
    S. Lee et al (2006), “A Study of Emotional Speech Articulation using a Fast Magnetic Resonance Imaging Technique,” Interspeech 2006, Pittsburgh.Google Scholar
  28. 28.
    M. Le Normand et al (2008), “Prosodic disturbances in autistic children speaking French, Speech Prosody,” Campinas, Brazil.Google Scholar
  29. 29.
    M. Lehtinen (2008), “The prosodic and nonverbal deficiencies of French- and Finnish-speaking persons with Asperger Syndrome,” Proceedings of the ISCA Workshop on Experimental Linguistics, Athens.Google Scholar
  30. 30.
    M. Levit et al (2001), “Use of prosodic speech characteristics for automated detection of alcohol intoxication,” ISCA Workshop on Prosody in Speech Recognition and Understanding, Red Bank NJ.Google Scholar
  31. 31.
    Linguistic Data Consortium, “Emotional prosody speech and transcripts,” LDC Catalog No.: LDC2002S28, University of Pennsylvania.Google Scholar
  32. 32.
    J. Liscombe et al (2005), “Using Context to Improve Emotion Detection in Spoken Dialog Systems,” Interspeech 2005, Lisbon.Google Scholar
  33. 33.
    J. Liscombe et al (2006), “Detecting Certainness in Spoken Tutorial Dialogues,” Interspeech 2006, Pittsburgh.Google Scholar
  34. 34.
    X. Luo et al (2006), “Vocal Emotion Recognition with Cochlear Implants,” Interspeech 2006, Pittsburgh.Google Scholar
  35. 35.
    A. Maier, T. Haderlein, U. Eysholdt, F. Rosanowski, A. Batliner, M. Schuster, E. Nöth (2009), “PEAKS – A systems for the automatic evaluation of voice and speech disorders,” Speech Communication 51 (2009):425–437.CrossRefGoogle Scholar
  36. 36.
    F. Mairesse and M. Walker (2006), “Automatic Recognition of Personality in Conversation,” HLT-NAACL 2006, New York City.Google Scholar
  37. 37.
    G. Mesibov (1992). “Treatment issues with high-functioning adolescents and adults with autism,” In E. Schopler & G. Mesibov (Eds.), High-functioning individuals with autism (pp. 143–156). New York: Plenum Press.Google Scholar
  38. 38.
    Elliot Moore II, Mark Clements, John Peifer and Lydia Weisser (2003), “Investigating the Role of Glottal Features in Classifying Clinical Depression,” IEEE EMBS, Cancun.Google Scholar
  39. 39.
    S. Mozziconacci and D. J. Hermes (1999), “Role of intonation patterns in conveying emotion in speech,” ICPhS 1999, San Francisco.Google Scholar
  40. 40.
    Mundt, J. et al (2007), “Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology,” Journal of Neurolinguistics, 20(1):50–64.CrossRefGoogle Scholar
  41. 41.
    P. Oudeyer (2002), “Novel useful features and algorithms for the recognition of emotions in human speech,” Speech Prosody 2002, Aix-en-Provence.Google Scholar
  42. 42.
    T. Oxman, S Rosenberg, P. Schurr, and G. Tucker (1988), “Diagnostic Classification Through Content Analysis of Patient Speech,” American Journal of Psychiatry. 1988. 145:464–468.Google Scholar
  43. 43.
    Pennebaker, J. et al (2001), Linguistic Inquiry and Word Count: LIWC 2001. Mahwah, NJ: Erlbaum.Google Scholar
  44. 44.
    J. Pennebaker, M. Mehl, and K. Niederhoffer (2003), “Psychological Aspects of Natural Language Use: our Words, our Selves,” Annu. Rev. Psychol. 2003. 54:547–77.CrossRefGoogle Scholar
  45. 45.
    J. Pestian, P. Matykiewicz, J. Grupp-Phelan, S. Arszman Lavanier, J. Combs, and R. Kowatch (2008), “Using Natural Language Processing to Classify Suicide Notes,” ACL BioNLP Workshop, pp. 96–97.Google Scholar
  46. 46.
    Paul, R et al (2008) “Production of syllable stress in speakers with autism spectrum disorders,” Research in Autism Spectrum Disorders, 2:110–124.CrossRefGoogle Scholar
  47. 47.
    R. Ranganath, D. Jurafsky, and D. McFarland (2009), “It’s Not You, it’s Me: Detecting Flirting and its Misperception inSpeed-Dates,” EMNLP 2009, Singapore.Google Scholar
  48. 48.
    Rapin, I., and Dunna, M. (2003), “Update on the language disorders of individuals on the autistic spectrum,” Brain Development. 25:166–172.CrossRefGoogle Scholar
  49. 49.
    Shriberg, L. et al, (2001), “Speech and prosody characteristics of adolescents and adults with high-functioning autism and Asperger syndrome,” Journal of Speech, Language, and Hearing Research; 44(5).Google Scholar
  50. 50.
    P. Stone, D. Dunphy, M. Smith, et al (1969), “The General Inquirer: A Computer Approach to Content Analysis,” Cambridge, Mass. MIT Press.Google Scholar
  51. 51.
    van Santen, J. et al (2009), “Automated assessment of prosody production,” Speech Communication 51:1082–1097.CrossRefGoogle Scholar
  52. 52.
    J. Yuan et al (2002), “The acoustic realization of anger, fear, joy, and sadness in Chinese,” ICSLP, Denver.Google Scholar
  53. 53.
    Zei Pollerman, B. (2002), “A Study of Emotional Speech Articulation using a Fast Magnetic Resonance Imaging Technique,” Speech Prosody 2002, Aix-en-Provence.Google Scholar
  54. 54.
    E. Zetterholm (1999), “Emotional speech focusing on voice quality,” FONETIK: The Swedish Phonetics Conference, Gothemburg.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Julia Hirschberg
    • 1
  • Anna Hjalmarsson
  • Noémie Elhadad
  1. 1.Department of Computer ScienceColumbia UniversityNew YorkUSA

Personalised recommendations