Enabling Speech Emotional Intelligence as a Service in Homecare Platforms

Stavrianos, Panagiotis; Pavlopoulos, Andrianos; Maglogiannis, Ilias

doi:10.1007/978-3-030-77746-3_9

Panagiotis Stavrianos⁷,
Andrianos Pavlopoulos⁷ &
Ilias Maglogiannis⁷

Part of the book series: EAI/Springer Innovations in Communication and Computing ((EAISICC))

Abstract

During the last years, the field of affective computing that deals with identification, recording, interpreting, and processing of emotion and affective state of an individual has won ground in the scientific community. Thus, the incorporation of affective computing and the corresponding emotional intelligence in homecare services, which entail the social and healthcare delivery at the home of the patient via the utilization of information and communication technology, seems quite important. Among the available means of expression, speech constitutes one of the most natural mechanisms, providing adequate information for recognizing emotion. In this paper, we describe the design and implementation of an affective recognition service integrated in a holistic electronic homecare management system, covering the entire lifecycle of doctor-patient interaction, incorporating speech emotion recognition (SER)-oriented methods. Within this context, we evaluate the performance of several SER techniques deployed in the homecare system, from well-established machine learning algorithms to Deep Learning architectures and we report the corresponding results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Russel, S. J., & Norvig, P. (2016). Artificial intelligence: a modern approach. Pearson Education Limited.
Google Scholar
Devillers, L., & Vidrascu, L. (2006). Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In Ninth international conference on spoken language processing. ISCA.
Google Scholar
Lee, C. C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53, 1162–1171.
Article Google Scholar
Polzehl, T., Sundaram, S., Ketabdar, H., Wagner, M., & Metze, F. (2009). Emotion classification in children’s speech using fusion of acoustic and linguistic features. In Tenth annual conference of the international speech communication association. ISCA.
Google Scholar
Hibbeln, M., Jenkins, J. L., Schneider, C., Valacich, J. S., & Weinmann, M. (2017). How is your user feeling? Inferring emotion through human–computer interaction devices. Group, 1000, 248.
Google Scholar
Kostoulas, T., Mporas, I., Kocsis, O., Ganchev, T., Katsaounos, N., Santamaria, J. J., Jimenez-Murcia, S., Fernandez-Aranda, F., & Fakotakis, N. (2012). Affective speech interface in serious games for supporting therapy of mental disorders. Expert Systems with Applications, 39, 11072–11079.
Article Google Scholar
Tyagi, R., & Agarwal, A. (2018). Emotion detection using speech analysis. Science, 3, 18–20.
Google Scholar
Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 39–58.
Article Google Scholar
Zhang, Y., Gravina, R., Lu, H., Villari, M., & Fortino, G. (2018). PEA: Parallel electrocardiogram-based authentication for smart healthcare systems. Journal of Network and Computer Applications, 117, 10–16.
Article Google Scholar
Iliadis, L. S., Maglogiannis, I., Papadopoulos, H., Karatzas, N., & Sioutas, S. (2012). Artificial intelligence applications and innovations: AIAI 2012 international workshops: AIAB, AIeIA, CISE, COPA, IIVC, ISQL, MHDW, and WADTMB. Springer.
Google Scholar
Akbulut, F. P., & Akan, A. (2018). A smart wearable system for short-term cardiovascular risk assessment with emotional dynamics. Measurement, 128, 237–246.
Article Google Scholar
Doukas, C., & Maglogiannis, I. (2008). Intelligent pervasive healthcare systems. Advanced Computational Intelligence Paradigms in Healthcare, 3, 95–115.
Google Scholar
Bou-Ghazale, S. E., & Hansen, J. H. (2000). A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Transactions on Speech and Audio Processing, 8, 429–442.
Article Google Scholar
O’Leary, A. (1992). Self-efficacy and health: Behavioral and stress-physiological mediation. Cognitive Therapy and Research, 16, 229–245.
Article Google Scholar
Broyles, D., Crichton, R., Jolliffe, B., & Dixon, B. E. (2016). Shared longitudinal health records for clinical and population health. Health Information Exchange, 2016, 149–162.
Article Google Scholar
Yamin, C. K., Emani, S., Williams, D. H., Lipsitz, S. R., Karson, A. S., Wald, J. S., & Bates, D. W. (2011). The digital divide in adoption and use of a personal health record. Archives of Internal Medicine, 171, 568–574.
Article Google Scholar
Doukas, C., & Maglogiannis, I. (2011). Managing wearable sensor data through cloud computing. In IEEE third international conference on cloud computing technology and science (CloudCom). IEEE.
Google Scholar
Sloman, A. (1999). Review of affective computing. AI Magazine, 20, 127.
Google Scholar
Alfano, C. A., Bower, J., Cowie, J., Lau, S., & Simpson, R. J. (2017). Long-duration space exploration and emotional health: Recommendations for conceptualizing and evaluating risk. Acta Astronautica, 142, 289–299.
Article Google Scholar
Fridlund, A. J. (2014). Human facial expression: An evolutionary view. Academic Press.
Google Scholar
Caridakis, G., Karpouzis, K., Wallace, M., Kessous, L., & Amir, N. (2010). Multimodal user’s affective state analysis in naturalistic interaction. Journal on Multimodal User Interfaces, 3, 49–66.
Article Google Scholar
Doukas, C., & Maglogiannis, I. (2008). Enabling human status awareness in assistive environments based on advanced sound and motion data classification. In Proceedings of the 1st international conference on PErvasive technologies related to assistive environments. ACM.
Google Scholar
Maglogiannis, I. G., Karpouzis, K., & Wallace, M. (2005). Image and signal processing for networked e-health applications. Synthesis Lectures on Biomedical Engineering, 1, 1–108.
Article Google Scholar
Nicholson, J., Takahashi, K., & Nakatsu, R. (2000). Emotion recognition in speech using neural networks. Neural Computing & Applications, 9, 290–296.
Article MATH Google Scholar
Seidel, E. M., Habel, U., Kirschner, M., Gur, R. C., & Derntl, B. (2010). The impact of facial emotional expressions on behavioral tendencies in women and men. Journal of Experimental Psychology: Human Perception and Performance, 36, 500.
Google Scholar
Schultz, D. P., & Schultz, S. E. (2016). Theories of personality. Cengage Learning, 29, 516.
Google Scholar
Picard, R. W. (1995). Affective computing. M.I.T Media Laboratory Perceptual Computing Section Technical Report No. 321.
Google Scholar
Simon, H. A. (1979). Models of thought (Vol. 2). Yale University Press.
Google Scholar
Mansoorizadeh, M., & Charkari, N. M. (2009). Hybrid feature and decision level fusion of face and speech information for bimodal emotion recognition. In IEEE computer conference CSICC. IEEE.
Google Scholar
Bejani, M., Gharavian, D., & Charkari, N. M. (2014). Audiovisual emotion recognition using ANOVA feature selection method and multi-classifier neural networks. Neural Computing and Applications, 24, 399–412.
Article Google Scholar
Busso, C., & Narayanan, S. S. (2007). Interrelation between speech and facial gestures in emotional utterances: a single subject study. IEEE Transactions on Audio, Speech and Language Processing, 15, 2331–2347.
Article Google Scholar
Jürgens, R., Grass, A., Drolet, M., & Fischer, J. (2015). Effect of acting experience on emotion expression and recognition in voice: non-actors provide better stimuli than expected. Journal of Nonverbal Behavior, 39, 195–214.
Article Google Scholar
Jürgens, R., Hammerschmidt, K., & Fischer, J. (2011). Authentic and play-acted vocal emotion expressions reveal acoustic differences. Frontiers in Psychology, 2, 180.
Article Google Scholar
Vogt, T., & André, E. (2005). Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In IEEE international conference on multimedia and expo. IEEE.
Google Scholar
Datatang. (2015). Chinese Academy of Sciences. Retrieved September 10, 2018, from http://www.en.datatang.com/product.php?id=28
Wang, K., An, N., Li, B. N., Zhang, Y., & Li, L. (2015). Speech emotion recognition using Fourier parameters. IEEE Transactions on Affective Computing, 6, 69–75.
Article Google Scholar
Dellaert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. In Fourth international conference on spoken language processing. IEEE.
Google Scholar
Rawat, A., & Mishra, P. K. (2015). Emotion recognition through speech using neural network. International Journal of Advanced Research in Computer Science and Software Engineering, 5, 422–428.
Google Scholar
Hamidi, M., & Mansoorizade, M. (2012). Emotion recognition from Persian speech with neural network. International Journal of Artificial Intelligence Applications, 3, 107.
Article Google Scholar
Quan, C., & Ren, F. (2016). Weighted high-order hidden Markov models for compound emotions recognition in text. Information Sciences, 329, 581–596.
Article Google Scholar
Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.
Article Google Scholar
Schuller, B., Rigoll, G., & Lang, M. (2003). Hidden Markov model-based speech emotion recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2, II.
Google Scholar
Hu, H., Xu, M. X., & Wu, W. (2007). Fusion of global statistical and segmental spectral features for speech emotion recognition. In Eighth annual conference of the international speech communication association. IEEE.
Google Scholar
Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48, 1162–1181.
Article Google Scholar
Schuller, B., Lang, M., & Rigoll, G. (2005). Robust acoustic speech emotion recognition by ensembles of classifiers. Tagungsband Fortschritte der Akustik-DAGA# 05, München.
Google Scholar
Wu, C. H., & Liang, W. B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transactions on Affective Computing, 2, 10–21.
Article Google Scholar
El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44, 572–587.
Article MATH Google Scholar
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., & Taylor, J. G. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18, 32–80.
Article Google Scholar
Argyle, M. (2013). The psychology of happiness. Routledge.
Book Google Scholar
Teager, H. (1980). Some observations on oral air flow during phonation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 599–601.
Article Google Scholar
Zhou, G., Hansen, J. H., & Kaiser, J. F. (2001). Nonlinear feature-based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 9, 201–216.
Article Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29, 82–97.
Article Google Scholar
Weninger, F., Geiger, J., Wöllmer, M., Schuller, B., & Rigoll, G. (2014). Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments. Computer Speech & Language, 28, 888–902.
Article Google Scholar
Mao, Q., Dong, M., Huang, Z., & Zhan, Y. (2014). Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia, 16, 2203–2213.
Article Google Scholar
Fayek, H. M., Lech, M., & Cavedon, L. (2017). Evaluating deep learning architectures for speech emotion recognition. Neural Networks, 92, 60–68.
Article Google Scholar
Zhu, L., Chen, L., Zhao, D., Zhou, J., & Zhang, W. (2017). Emotion recognition from Chinese speech for smart affective services using a combination of SVM and DBN. Sensors, 17, 1694.
Article Google Scholar
Sangeetha, J., & Jayasankar, T. (2019). Emotion speech recognition based on adaptive fractional deep belief network and reinforcement learning. Cognitive Informatics and Soft Computing, 2019, 165–174.
Article Google Scholar
Li, L., Zhao, Y., Jiang, D., Zhang, Y., Wang, F., Gonzalez, I., Valentin, E., & Sahli, H. (2013). Hybrid deep neural network--hidden Markov model (DNN-HMM) based speech emotion recognition. In Humaine association conference on affective computing and intelligent interaction (ACII). ACII.
Google Scholar
Uzair, M., Shafait, F., Ghanem, B., & Mian, A. (2018). Representation learning with deep extreme learning machines for efficient image set classification. Neural Computing and Applications, 30, 1211–1223.
Article Google Scholar
Lim, W., Jang, D., & Lee, T. (2016). Speech emotion recognition using convolutional and recurrent neural networks. In IEEE signal and information processing association annual summit and conference. IEEE.
Google Scholar
Mu, N., Xu, X., Zhang, X., & Zhang, H. (2018). Salient object detection using a covariance-based CNN model in low-contrast images. Neural Computing and Applications, 29, 181–192.
Article Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. In Ninth European conference on speech communication and technology. ISCA.
Google Scholar
Tao, J. H., Liu, F., Zhang, M., & Jia, H. B. (2008). Design of speech corpus for mandarin text to speech. In The Blizzard challenge workshop. IEEE.
Google Scholar
Wang, K. X., Zhang, Q. L., & Liao, S. Y. (2014). A database of elderly emotional speech. In Proc. int. symp. signal process. Biomed. Eng Informat.
Google Scholar
Vincent, E., Watanabe, S., Barker, J., & Marxer, R. (2016). The 4th CHiME speech separation and recognition challenge. CHiME.
Google Scholar
Haq, S., Jackson, P. J., & Edge, J. (2009). Speaker-dependent audio-visual emotion recognition. AVSP, 2009, 53–58.
Google Scholar
Engberg, I. S., Hansen, A. V., Andersen, O., & Dalsgaard, P. (1997). Design, recording and verification of a Danish emotional speech database. In Fifth european conference on speech communication and technology. ISCA.
Google Scholar
Mao, X., & Chen, L. (2010). Speech emotion recognition based on parametric filter and fractal dimension. IEICE Transactions on Information and Systems, 93, 2324–2326.
Article Google Scholar
Badshah, A. M., Ahmad, J., Rahim, N., & Baik, S. W. (2017). Speech emotion recognition from spectrograms with deep convolutional neural network. In International conference on platform technology and service. IEEE.
Google Scholar
Chandrasekar, P., Chapaneri, S., & Jayaswal, D. (2014). Automatic speech emotion recognition: A survey. In IEEE international conference on circuits, systems, communication and information technology applications (pp. 341–346). IEEE.
Google Scholar
Pervaiz, M., & Khan, T. A. (2016). Emotion recognition from speech using prosodic and linguistic features. Emotion, 7, 84.
Google Scholar
O’Connor, J. D., & Arnold, G. F. (2004). Intonation of colloquial English. РГБ.
Google Scholar
Schubiger, M. (1958). English intonation, its form and function. M. Niemeyer Verlag.
Google Scholar
Hozjan, V., Kacic, Z., Moreno, A., Bonafonte, A., & Nogueiras, A. (2002). Interface databases: Design and collection of a multilingual emotional speech database. LREC.
Google Scholar
Campbell, N. (2000). Databases of emotional speech. In ISCA tutorial and research workshop (ITRW) on speech and emotion. ISCA.
Google Scholar
Lee, C. M., & Narayanan, S. S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13, 293–303.
Article Google Scholar
Breazeal, C., & Aryananda, L. (2002). Recognition of affective communicative intent in robot-directed speech. Autonomous Robots, 12, 83–104.
Article MATH Google Scholar
Zhu, M., Zhang, Z., Hirdes, J. P., & Stolee, P. (2007). Using machine learning algorithms to guide rehabilitation planning for home care clients. BMC Medical Informatics and Decision Making, 7, 41.
Article Google Scholar
Wells, J. L., Seabrook, J. A., Stolee, P., Borrie, M. J., & Knoefel, F. (2003). State of the art in geriatric rehabilitation. Part I: Review of frailty and comprehensive geriatric assessment. Archives of Physical Medicine and Rehabilitation, 84, 890–897.
Article Google Scholar
Coleman, E. A. (2003). Falling through the cracks: Challenges and opportunities for improving transitional care for persons with continuous complex care needs. Journal of the American Geriatrics Society, 51, 549–555.
Article Google Scholar
Giannakopoulos, T. (2015). Pyaudioanalysis: An open-source python library for audio signal analysis. PLoS One, 10, e0144610.
Article Google Scholar
Beeke, S., Wilkinson, R., & Maxim, J. (2009). Prosody as a compensatory strategy in the conversations of people with agrammatism. Clinical Linguistics & Phonetics, 23, 133–155.
Article Google Scholar
Borchert, M., & Dusterhoft, A. (2005). Emotions in speech-experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments. In IEEE International conference on natural language processing and knowledge engineering. IEEE.
Google Scholar
Li, X., Tao, J., Johnson, M. T., Soltis, J., Savage, A., Leong, K. M., & Newman, J. D. (2007). Stress and emotion classification using jitter and shimmer features. In IEEE international conference on acoustics, speech and signal processing. IEEE.
Google Scholar
Chollet, F. (2015). Keras: Deep learning library for theano and tensorflow. Keras.
Google Scholar
Ekman, P. (1970). Universal facial expressions of emotions. California Mental Health Research Digest, 8(4), 151–158.
Google Scholar
He, K., & Sun, J. (2015). Convolutional neural networks at constrained time cost. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5353–5360). IEEE.
Google Scholar
Dupuis, K., & Pichora-Fuller, M. K. (2010). Toronto emotional speech set (TESS). University of Toronto.
Google Scholar
Livingstone, S. R., & Russo, F. A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS One, 13(5), e0196391.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Digital Systems, University of Piraeus, Piraeus, Greece
Panagiotis Stavrianos, Andrianos Pavlopoulos & Ilias Maglogiannis

Authors

Panagiotis Stavrianos
View author publications
You can also search for this author in PubMed Google Scholar
Andrianos Pavlopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Ilias Maglogiannis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ilias Maglogiannis .

Editor information

Editors and Affiliations

Information Technology Department, College of Applied Sciences, University of Technology and Applied Sciences, Ibri, Oman
Mohammad Shahid Husain
Universiti Pendidikan Sultan Idris, Tanjung Malim, Malaysia
Muhamad Hariz Bin Muhamad Adnan
Computer Science and Engineering Department, Integral University, Lucknow, India
Mohammad Zunnun Khan
Data Science Institute, National University of Ireland, Galway, Ireland
Saurabh Shukla
Global Health IT Product Consultant, Rhapsody, Auckland, New Zealand
Fahad U Khan

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Stavrianos, P., Pavlopoulos, A., Maglogiannis, I. (2022). Enabling Speech Emotional Intelligence as a Service in Homecare Platforms. In: Husain, M.S., Adnan, M.H.B.M., Khan, M.Z., Shukla, S., Khan, F.U. (eds) Pervasive Healthcare. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-030-77746-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-77746-3_9
Published: 16 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77745-6
Online ISBN: 978-3-030-77746-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics