Abstract
Recently, researchers in computer science and engineering have begun to explore the possibility of finding speech-based correlates of various medical conditions using automatic, computational methods. If such language cues can be identified and quantified automatically, this information can be used to support diagnosis and treatment of medical conditions in clinical settings and to further fundamental research in understanding cognition. This chapter reviews computational approaches that explore communicative patterns of patients who suffer from medical conditions such as depression, autism spectrum disorders, schizophrenia, and cancer. There are two main approaches discussed: research that explores features extracted from the acoustic signal and research that focuses on lexical and semantic features. We also present some applied research that uses computational methods to develop assistive technologies. In the final sections we discuss issues related to and the future of this emerging field of research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
H. Ai et al (2006), “Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs,” Interspeech 2006, Pittsburgh.
M. Alpert et al (2001), “Reflections of depression in acoustic measures of the patient’s speech,” Journal of Affective Disorders, 66:59–69.
J. Ang et al (2002), “Prosody-based automatic detection of annoyance and frustration in human-computer dialog”, ICSLP 2002, Denver.
H. Asperger (1944) (tr. U. Frith (1991), “Autistic psychopathy in childhood,” in U. Frith. Autism and Asperger syndrome. Cambridge University Press. pp. 37–92.
E. Bantum and J. Owen (2009), “Evaluating the Validity of Computerized Content Analysis Programs for Identification of Emotional Expression in Cancer Narratives,” Psychological Assessment, 2009, 21(1): 79–88.
Emo-D B. Berlin Emotional Speech Corpus. (http://pascal.kgw.tu-berlin.de/emodb/).
D. Bitouk et al. (2009), “Improving Emotion Recognition using Class-Level Spectral Features,” Interspeech 2009, Brighton.
C. Baltaxe (1984). “Use of contrastive stress in normal, aphasic, and autistic children,” Journal of Speech and Hearing Research, 27:97–105.
A. Batliner et al, (2003) “How to find trouble in communication,” Speech Communication, 40, pp. 117–143.
P. Boersma & D. Weenink (2005). PRAAT: Doing phonetics by computer (Version 4.3.14) [Computer program]. Retrieved from http://www.praat.org.
F. Burkhardt et al. (2005), “A Database of German Emotional Speech,” Interspeech 2005, Lisbon.
J. Diehl et al (2009), “An acoustic analysis of prosody in high-functioning autism”, Applied Psycholinguistics, 30(3).
R. el Kaliouby et al. (2006). “An Exploratory Social-Emotional Prosthetic for Autism Spectrum Disorders,” in Body Sensor Networks. 2006. MIT Media Lab.
R.B Fink et al (2009). “Evaluating Speech Recognition in a Computerized Naming Program for Aphasia,” American Speech-Language Hearing Association Conference. New Orleans, November.
R. B. Fink et al. (2002). “A computer implemented protocol for treatment of naming disorders: Evaluation ofclinician-guided and partially self-guided instruction,” Aphasiology,16(10/11):1061–1086.
B. Elvevaag, P. Foltz, D. Weinberger, and T. Goldberg (2007), “Quantifying Incoherence in Speech: an Automated Methodology and Novel Application to Schizophrenia,” Schizophrenia Research, 93:304–316.
B. Elvevaag, P. Foltz, M Rosenstein, and L. DeLisi (2009), “An automated method to analyze language use in patients with schizophrenia and their first degree-relatives,” Journal of Neurolinguistics.
W. Goldfarb et al. (1972), “Speech and language faults in schizophrenic children. Journal of Autism and Childhood Schizophrenia, 2(3):219–233, 1972.
P. Gupta & N. Rajput, (2006), “Two-Stream Emotion Recognition For Call Center Monitoring”, Interspeech 2006, Pittsburgh.
Gottschalk, L., Winget, C., & Gleser, G. (1969). Manual of instructions for using the Gottschalk-Gleser content analysis scales: Anxiety, hostility, and social alienation-personal disorganization. Berkeley: University of California Press.
K. Graves et al. (2005), “Emotional expression and emotional recognition in breast cancer survivors: A controlled comparison,” Psychology and Health, 20:579–595.
M. E. Hoque et al. (2009), “Exploring Speech Therapy Games with Children on the Autism Spectrum,” Interspeech 2009, Brighton.
T. Johnstone et al (2006), “The voice of emotion: an FMRI study of neural responses to angry and happy vocal expressions,” Social, Cognitive and Affective Neuroscience, 1(3), 242–249.
L. Kanner (1946), “Irrelevant and metaphorical language in early infantile autism,” American Journal of Psychiatry, 103:242–246.
L. Kanner (1948), “Autistic Disturbances of Affective Contact,” Nervous Child, 2:217–2520.
C. M. Lee and S. Narayanan (2004), “Towards detecting emotionsin spoken dialogs,” IEEE Transactions on Speech and Audio Processing, 2004.
S. Lee et al (2006), “A Study of Emotional Speech Articulation using a Fast Magnetic Resonance Imaging Technique,” Interspeech 2006, Pittsburgh.
M. Le Normand et al (2008), “Prosodic disturbances in autistic children speaking French, Speech Prosody,” Campinas, Brazil.
M. Lehtinen (2008), “The prosodic and nonverbal deficiencies of French- and Finnish-speaking persons with Asperger Syndrome,” Proceedings of the ISCA Workshop on Experimental Linguistics, Athens.
M. Levit et al (2001), “Use of prosodic speech characteristics for automated detection of alcohol intoxication,” ISCA Workshop on Prosody in Speech Recognition and Understanding, Red Bank NJ.
Linguistic Data Consortium, “Emotional prosody speech and transcripts,” LDC Catalog No.: LDC2002S28, University of Pennsylvania.
J. Liscombe et al (2005), “Using Context to Improve Emotion Detection in Spoken Dialog Systems,” Interspeech 2005, Lisbon.
J. Liscombe et al (2006), “Detecting Certainness in Spoken Tutorial Dialogues,” Interspeech 2006, Pittsburgh.
X. Luo et al (2006), “Vocal Emotion Recognition with Cochlear Implants,” Interspeech 2006, Pittsburgh.
A. Maier, T. Haderlein, U. Eysholdt, F. Rosanowski, A. Batliner, M. Schuster, E. Nöth (2009), “PEAKS – A systems for the automatic evaluation of voice and speech disorders,” Speech Communication 51 (2009):425–437.
F. Mairesse and M. Walker (2006), “Automatic Recognition of Personality in Conversation,” HLT-NAACL 2006, New York City.
G. Mesibov (1992). “Treatment issues with high-functioning adolescents and adults with autism,” In E. Schopler & G. Mesibov (Eds.), High-functioning individuals with autism (pp. 143–156). New York: Plenum Press.
Elliot Moore II, Mark Clements, John Peifer and Lydia Weisser (2003), “Investigating the Role of Glottal Features in Classifying Clinical Depression,” IEEE EMBS, Cancun.
S. Mozziconacci and D. J. Hermes (1999), “Role of intonation patterns in conveying emotion in speech,” ICPhS 1999, San Francisco.
Mundt, J. et al (2007), “Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology,” Journal of Neurolinguistics, 20(1):50–64.
P. Oudeyer (2002), “Novel useful features and algorithms for the recognition of emotions in human speech,” Speech Prosody 2002, Aix-en-Provence.
T. Oxman, S Rosenberg, P. Schurr, and G. Tucker (1988), “Diagnostic Classification Through Content Analysis of Patient Speech,” American Journal of Psychiatry. 1988. 145:464–468.
Pennebaker, J. et al (2001), Linguistic Inquiry and Word Count: LIWC 2001. Mahwah, NJ: Erlbaum.
J. Pennebaker, M. Mehl, and K. Niederhoffer (2003), “Psychological Aspects of Natural Language Use: our Words, our Selves,” Annu. Rev. Psychol. 2003. 54:547–77.
J. Pestian, P. Matykiewicz, J. Grupp-Phelan, S. Arszman Lavanier, J. Combs, and R. Kowatch (2008), “Using Natural Language Processing to Classify Suicide Notes,” ACL BioNLP Workshop, pp. 96–97.
Paul, R et al (2008) “Production of syllable stress in speakers with autism spectrum disorders,” Research in Autism Spectrum Disorders, 2:110–124.
R. Ranganath, D. Jurafsky, and D. McFarland (2009), “It’s Not You, it’s Me: Detecting Flirting and its Misperception inSpeed-Dates,” EMNLP 2009, Singapore.
Rapin, I., and Dunna, M. (2003), “Update on the language disorders of individuals on the autistic spectrum,” Brain Development. 25:166–172.
Shriberg, L. et al, (2001), “Speech and prosody characteristics of adolescents and adults with high-functioning autism and Asperger syndrome,” Journal of Speech, Language, and Hearing Research; 44(5).
P. Stone, D. Dunphy, M. Smith, et al (1969), “The General Inquirer: A Computer Approach to Content Analysis,” Cambridge, Mass. MIT Press.
van Santen, J. et al (2009), “Automated assessment of prosody production,” Speech Communication 51:1082–1097.
J. Yuan et al (2002), “The acoustic realization of anger, fear, joy, and sadness in Chinese,” ICSLP, Denver.
Zei Pollerman, B. (2002), “A Study of Emotional Speech Articulation using a Fast Magnetic Resonance Imaging Technique,” Speech Prosody 2002, Aix-en-Provence.
E. Zetterholm (1999), “Emotional speech focusing on voice quality,” FONETIK: The Swedish Phonetics Conference, Gothemburg.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Hirschberg, J., Hjalmarsson, A., Elhadad, N. (2010). “You’re as Sick as You Sound”: Using Computational Approaches for Modeling Speaker State to Gauge Illness and Recovery. In: Neustein, A. (eds) Advances in Speech Recognition. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-5951-5_13
Download citation
DOI: https://doi.org/10.1007/978-1-4419-5951-5_13
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-5950-8
Online ISBN: 978-1-4419-5951-5
eBook Packages: EngineeringEngineering (R0)