Skip to main content

Enabling Speech Emotional Intelligence as a Service in Homecare Platforms

  • Chapter
  • First Online:
Pervasive Healthcare

Abstract

During the last years, the field of affective computing that deals with identification, recording, interpreting, and processing of emotion and affective state of an individual has won ground in the scientific community. Thus, the incorporation of affective computing and the corresponding emotional intelligence in homecare services, which entail the social and healthcare delivery at the home of the patient via the utilization of information and communication technology, seems quite important. Among the available means of expression, speech constitutes one of the most natural mechanisms, providing adequate information for recognizing emotion. In this paper, we describe the design and implementation of an affective recognition service integrated in a holistic electronic homecare management system, covering the entire lifecycle of doctor-patient interaction, incorporating speech emotion recognition (SER)-oriented methods. Within this context, we evaluate the performance of several SER techniques deployed in the homecare system, from well-established machine learning algorithms to Deep Learning architectures and we report the corresponding results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Russel, S. J., & Norvig, P. (2016). Artificial intelligence: a modern approach. Pearson Education Limited.

    Google Scholar 

  2. Devillers, L., & Vidrascu, L. (2006). Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In Ninth international conference on spoken language processing. ISCA.

    Google Scholar 

  3. Lee, C. C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53, 1162–1171.

    Article  Google Scholar 

  4. Polzehl, T., Sundaram, S., Ketabdar, H., Wagner, M., & Metze, F. (2009). Emotion classification in children’s speech using fusion of acoustic and linguistic features. In Tenth annual conference of the international speech communication association. ISCA.

    Google Scholar 

  5. Hibbeln, M., Jenkins, J. L., Schneider, C., Valacich, J. S., & Weinmann, M. (2017). How is your user feeling? Inferring emotion through human–computer interaction devices. Group, 1000, 248.

    Google Scholar 

  6. Kostoulas, T., Mporas, I., Kocsis, O., Ganchev, T., Katsaounos, N., Santamaria, J. J., Jimenez-Murcia, S., Fernandez-Aranda, F., & Fakotakis, N. (2012). Affective speech interface in serious games for supporting therapy of mental disorders. Expert Systems with Applications, 39, 11072–11079.

    Article  Google Scholar 

  7. Tyagi, R., & Agarwal, A. (2018). Emotion detection using speech analysis. Science, 3, 18–20.

    Google Scholar 

  8. Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 39–58.

    Article  Google Scholar 

  9. Zhang, Y., Gravina, R., Lu, H., Villari, M., & Fortino, G. (2018). PEA: Parallel electrocardiogram-based authentication for smart healthcare systems. Journal of Network and Computer Applications, 117, 10–16.

    Article  Google Scholar 

  10. Iliadis, L. S., Maglogiannis, I., Papadopoulos, H., Karatzas, N., & Sioutas, S. (2012). Artificial intelligence applications and innovations: AIAI 2012 international workshops: AIAB, AIeIA, CISE, COPA, IIVC, ISQL, MHDW, and WADTMB. Springer.

    Google Scholar 

  11. Akbulut, F. P., & Akan, A. (2018). A smart wearable system for short-term cardiovascular risk assessment with emotional dynamics. Measurement, 128, 237–246.

    Article  Google Scholar 

  12. Doukas, C., & Maglogiannis, I. (2008). Intelligent pervasive healthcare systems. Advanced Computational Intelligence Paradigms in Healthcare, 3, 95–115.

    Google Scholar 

  13. Bou-Ghazale, S. E., & Hansen, J. H. (2000). A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Transactions on Speech and Audio Processing, 8, 429–442.

    Article  Google Scholar 

  14. O’Leary, A. (1992). Self-efficacy and health: Behavioral and stress-physiological mediation. Cognitive Therapy and Research, 16, 229–245.

    Article  Google Scholar 

  15. Broyles, D., Crichton, R., Jolliffe, B., & Dixon, B. E. (2016). Shared longitudinal health records for clinical and population health. Health Information Exchange, 2016, 149–162.

    Article  Google Scholar 

  16. Yamin, C. K., Emani, S., Williams, D. H., Lipsitz, S. R., Karson, A. S., Wald, J. S., & Bates, D. W. (2011). The digital divide in adoption and use of a personal health record. Archives of Internal Medicine, 171, 568–574.

    Article  Google Scholar 

  17. Doukas, C., & Maglogiannis, I. (2011). Managing wearable sensor data through cloud computing. In IEEE third international conference on cloud computing technology and science (CloudCom). IEEE.

    Google Scholar 

  18. Sloman, A. (1999). Review of affective computing. AI Magazine, 20, 127.

    Google Scholar 

  19. Alfano, C. A., Bower, J., Cowie, J., Lau, S., & Simpson, R. J. (2017). Long-duration space exploration and emotional health: Recommendations for conceptualizing and evaluating risk. Acta Astronautica, 142, 289–299.

    Article  Google Scholar 

  20. Fridlund, A. J. (2014). Human facial expression: An evolutionary view. Academic Press.

    Google Scholar 

  21. Caridakis, G., Karpouzis, K., Wallace, M., Kessous, L., & Amir, N. (2010). Multimodal user’s affective state analysis in naturalistic interaction. Journal on Multimodal User Interfaces, 3, 49–66.

    Article  Google Scholar 

  22. Doukas, C., & Maglogiannis, I. (2008). Enabling human status awareness in assistive environments based on advanced sound and motion data classification. In Proceedings of the 1st international conference on PErvasive technologies related to assistive environments. ACM.

    Google Scholar 

  23. Maglogiannis, I. G., Karpouzis, K., & Wallace, M. (2005). Image and signal processing for networked e-health applications. Synthesis Lectures on Biomedical Engineering, 1, 1–108.

    Article  Google Scholar 

  24. Nicholson, J., Takahashi, K., & Nakatsu, R. (2000). Emotion recognition in speech using neural networks. Neural Computing & Applications, 9, 290–296.

    Article  MATH  Google Scholar 

  25. Seidel, E. M., Habel, U., Kirschner, M., Gur, R. C., & Derntl, B. (2010). The impact of facial emotional expressions on behavioral tendencies in women and men. Journal of Experimental Psychology: Human Perception and Performance, 36, 500.

    Google Scholar 

  26. Schultz, D. P., & Schultz, S. E. (2016). Theories of personality. Cengage Learning, 29, 516.

    Google Scholar 

  27. Picard, R. W. (1995). Affective computing. M.I.T Media Laboratory Perceptual Computing Section Technical Report No. 321.

    Google Scholar 

  28. Simon, H. A. (1979). Models of thought (Vol. 2). Yale University Press.

    Google Scholar 

  29. Mansoorizadeh, M., & Charkari, N. M. (2009). Hybrid feature and decision level fusion of face and speech information for bimodal emotion recognition. In IEEE computer conference CSICC. IEEE.

    Google Scholar 

  30. Bejani, M., Gharavian, D., & Charkari, N. M. (2014). Audiovisual emotion recognition using ANOVA feature selection method and multi-classifier neural networks. Neural Computing and Applications, 24, 399–412.

    Article  Google Scholar 

  31. Busso, C., & Narayanan, S. S. (2007). Interrelation between speech and facial gestures in emotional utterances: a single subject study. IEEE Transactions on Audio, Speech and Language Processing, 15, 2331–2347.

    Article  Google Scholar 

  32. Jürgens, R., Grass, A., Drolet, M., & Fischer, J. (2015). Effect of acting experience on emotion expression and recognition in voice: non-actors provide better stimuli than expected. Journal of Nonverbal Behavior, 39, 195–214.

    Article  Google Scholar 

  33. Jürgens, R., Hammerschmidt, K., & Fischer, J. (2011). Authentic and play-acted vocal emotion expressions reveal acoustic differences. Frontiers in Psychology, 2, 180.

    Article  Google Scholar 

  34. Vogt, T., & André, E. (2005). Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In IEEE international conference on multimedia and expo. IEEE.

    Google Scholar 

  35. Datatang. (2015). Chinese Academy of Sciences. Retrieved September 10, 2018, from http://www.en.datatang.com/product.php?id=28

  36. Wang, K., An, N., Li, B. N., Zhang, Y., & Li, L. (2015). Speech emotion recognition using Fourier parameters. IEEE Transactions on Affective Computing, 6, 69–75.

    Article  Google Scholar 

  37. Dellaert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. In Fourth international conference on spoken language processing. IEEE.

    Google Scholar 

  38. Rawat, A., & Mishra, P. K. (2015). Emotion recognition through speech using neural network. International Journal of Advanced Research in Computer Science and Software Engineering, 5, 422–428.

    Google Scholar 

  39. Hamidi, M., & Mansoorizade, M. (2012). Emotion recognition from Persian speech with neural network. International Journal of Artificial Intelligence Applications, 3, 107.

    Article  Google Scholar 

  40. Quan, C., & Ren, F. (2016). Weighted high-order hidden Markov models for compound emotions recognition in text. Information Sciences, 329, 581–596.

    Article  Google Scholar 

  41. Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.

    Article  Google Scholar 

  42. Schuller, B., Rigoll, G., & Lang, M. (2003). Hidden Markov model-based speech emotion recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2, II.

    Google Scholar 

  43. Hu, H., Xu, M. X., & Wu, W. (2007). Fusion of global statistical and segmental spectral features for speech emotion recognition. In Eighth annual conference of the international speech communication association. IEEE.

    Google Scholar 

  44. Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48, 1162–1181.

    Article  Google Scholar 

  45. Schuller, B., Lang, M., & Rigoll, G. (2005). Robust acoustic speech emotion recognition by ensembles of classifiers. Tagungsband Fortschritte der Akustik-DAGA# 05, München.

    Google Scholar 

  46. Wu, C. H., & Liang, W. B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transactions on Affective Computing, 2, 10–21.

    Article  Google Scholar 

  47. El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44, 572–587.

    Article  MATH  Google Scholar 

  48. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., & Taylor, J. G. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18, 32–80.

    Article  Google Scholar 

  49. Argyle, M. (2013). The psychology of happiness. Routledge.

    Book  Google Scholar 

  50. Teager, H. (1980). Some observations on oral air flow during phonation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 599–601.

    Article  Google Scholar 

  51. Zhou, G., Hansen, J. H., & Kaiser, J. F. (2001). Nonlinear feature-based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 9, 201–216.

    Article  Google Scholar 

  52. Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29, 82–97.

    Article  Google Scholar 

  53. Weninger, F., Geiger, J., Wöllmer, M., Schuller, B., & Rigoll, G. (2014). Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments. Computer Speech & Language, 28, 888–902.

    Article  Google Scholar 

  54. Mao, Q., Dong, M., Huang, Z., & Zhan, Y. (2014). Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia, 16, 2203–2213.

    Article  Google Scholar 

  55. Fayek, H. M., Lech, M., & Cavedon, L. (2017). Evaluating deep learning architectures for speech emotion recognition. Neural Networks, 92, 60–68.

    Article  Google Scholar 

  56. Zhu, L., Chen, L., Zhao, D., Zhou, J., & Zhang, W. (2017). Emotion recognition from Chinese speech for smart affective services using a combination of SVM and DBN. Sensors, 17, 1694.

    Article  Google Scholar 

  57. Sangeetha, J., & Jayasankar, T. (2019). Emotion speech recognition based on adaptive fractional deep belief network and reinforcement learning. Cognitive Informatics and Soft Computing, 2019, 165–174.

    Article  Google Scholar 

  58. Li, L., Zhao, Y., Jiang, D., Zhang, Y., Wang, F., Gonzalez, I., Valentin, E., & Sahli, H. (2013). Hybrid deep neural network--hidden Markov model (DNN-HMM) based speech emotion recognition. In Humaine association conference on affective computing and intelligent interaction (ACII). ACII.

    Google Scholar 

  59. Uzair, M., Shafait, F., Ghanem, B., & Mian, A. (2018). Representation learning with deep extreme learning machines for efficient image set classification. Neural Computing and Applications, 30, 1211–1223.

    Article  Google Scholar 

  60. Lim, W., Jang, D., & Lee, T. (2016). Speech emotion recognition using convolutional and recurrent neural networks. In IEEE signal and information processing association annual summit and conference. IEEE.

    Google Scholar 

  61. Mu, N., Xu, X., Zhang, X., & Zhang, H. (2018). Salient object detection using a covariance-based CNN model in low-contrast images. Neural Computing and Applications, 29, 181–192.

    Article  Google Scholar 

  62. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. In Ninth European conference on speech communication and technology. ISCA.

    Google Scholar 

  63. Tao, J. H., Liu, F., Zhang, M., & Jia, H. B. (2008). Design of speech corpus for mandarin text to speech. In The Blizzard challenge workshop. IEEE.

    Google Scholar 

  64. Wang, K. X., Zhang, Q. L., & Liao, S. Y. (2014). A database of elderly emotional speech. In Proc. int. symp. signal process. Biomed. Eng Informat.

    Google Scholar 

  65. Vincent, E., Watanabe, S., Barker, J., & Marxer, R. (2016). The 4th CHiME speech separation and recognition challenge. CHiME.

    Google Scholar 

  66. Haq, S., Jackson, P. J., & Edge, J. (2009). Speaker-dependent audio-visual emotion recognition. AVSP, 2009, 53–58.

    Google Scholar 

  67. Engberg, I. S., Hansen, A. V., Andersen, O., & Dalsgaard, P. (1997). Design, recording and verification of a Danish emotional speech database. In Fifth european conference on speech communication and technology. ISCA.

    Google Scholar 

  68. Mao, X., & Chen, L. (2010). Speech emotion recognition based on parametric filter and fractal dimension. IEICE Transactions on Information and Systems, 93, 2324–2326.

    Article  Google Scholar 

  69. Badshah, A. M., Ahmad, J., Rahim, N., & Baik, S. W. (2017). Speech emotion recognition from spectrograms with deep convolutional neural network. In International conference on platform technology and service. IEEE.

    Google Scholar 

  70. Chandrasekar, P., Chapaneri, S., & Jayaswal, D. (2014). Automatic speech emotion recognition: A survey. In IEEE international conference on circuits, systems, communication and information technology applications (pp. 341–346). IEEE.

    Google Scholar 

  71. Pervaiz, M., & Khan, T. A. (2016). Emotion recognition from speech using prosodic and linguistic features. Emotion, 7, 84.

    Google Scholar 

  72. O’Connor, J. D., & Arnold, G. F. (2004). Intonation of colloquial English. РГБ.

    Google Scholar 

  73. Schubiger, M. (1958). English intonation, its form and function. M. Niemeyer Verlag.

    Google Scholar 

  74. Hozjan, V., Kacic, Z., Moreno, A., Bonafonte, A., & Nogueiras, A. (2002). Interface databases: Design and collection of a multilingual emotional speech database. LREC.

    Google Scholar 

  75. Campbell, N. (2000). Databases of emotional speech. In ISCA tutorial and research workshop (ITRW) on speech and emotion. ISCA.

    Google Scholar 

  76. Lee, C. M., & Narayanan, S. S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13, 293–303.

    Article  Google Scholar 

  77. Breazeal, C., & Aryananda, L. (2002). Recognition of affective communicative intent in robot-directed speech. Autonomous Robots, 12, 83–104.

    Article  MATH  Google Scholar 

  78. Zhu, M., Zhang, Z., Hirdes, J. P., & Stolee, P. (2007). Using machine learning algorithms to guide rehabilitation planning for home care clients. BMC Medical Informatics and Decision Making, 7, 41.

    Article  Google Scholar 

  79. Wells, J. L., Seabrook, J. A., Stolee, P., Borrie, M. J., & Knoefel, F. (2003). State of the art in geriatric rehabilitation. Part I: Review of frailty and comprehensive geriatric assessment. Archives of Physical Medicine and Rehabilitation, 84, 890–897.

    Article  Google Scholar 

  80. Coleman, E. A. (2003). Falling through the cracks: Challenges and opportunities for improving transitional care for persons with continuous complex care needs. Journal of the American Geriatrics Society, 51, 549–555.

    Article  Google Scholar 

  81. Giannakopoulos, T. (2015). Pyaudioanalysis: An open-source python library for audio signal analysis. PLoS One, 10, e0144610.

    Article  Google Scholar 

  82. Beeke, S., Wilkinson, R., & Maxim, J. (2009). Prosody as a compensatory strategy in the conversations of people with agrammatism. Clinical Linguistics & Phonetics, 23, 133–155.

    Article  Google Scholar 

  83. Borchert, M., & Dusterhoft, A. (2005). Emotions in speech-experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments. In IEEE International conference on natural language processing and knowledge engineering. IEEE.

    Google Scholar 

  84. Li, X., Tao, J., Johnson, M. T., Soltis, J., Savage, A., Leong, K. M., & Newman, J. D. (2007). Stress and emotion classification using jitter and shimmer features. In IEEE international conference on acoustics, speech and signal processing. IEEE.

    Google Scholar 

  85. Chollet, F. (2015). Keras: Deep learning library for theano and tensorflow. Keras.

    Google Scholar 

  86. Ekman, P. (1970). Universal facial expressions of emotions. California Mental Health Research Digest, 8(4), 151–158.

    Google Scholar 

  87. He, K., & Sun, J. (2015). Convolutional neural networks at constrained time cost. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5353–5360). IEEE.

    Google Scholar 

  88. Dupuis, K., & Pichora-Fuller, M. K. (2010). Toronto emotional speech set (TESS). University of Toronto.

    Google Scholar 

  89. Livingstone, S. R., & Russo, F. A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS One, 13(5), e0196391.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ilias Maglogiannis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Stavrianos, P., Pavlopoulos, A., Maglogiannis, I. (2022). Enabling Speech Emotional Intelligence as a Service in Homecare Platforms. In: Husain, M.S., Adnan, M.H.B.M., Khan, M.Z., Shukla, S., Khan, F.U. (eds) Pervasive Healthcare. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-030-77746-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77746-3_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77745-6

  • Online ISBN: 978-3-030-77746-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics