Abstract
During the last years, the field of affective computing that deals with identification, recording, interpreting, and processing of emotion and affective state of an individual has won ground in the scientific community. Thus, the incorporation of affective computing and the corresponding emotional intelligence in homecare services, which entail the social and healthcare delivery at the home of the patient via the utilization of information and communication technology, seems quite important. Among the available means of expression, speech constitutes one of the most natural mechanisms, providing adequate information for recognizing emotion. In this paper, we describe the design and implementation of an affective recognition service integrated in a holistic electronic homecare management system, covering the entire lifecycle of doctor-patient interaction, incorporating speech emotion recognition (SER)-oriented methods. Within this context, we evaluate the performance of several SER techniques deployed in the homecare system, from well-established machine learning algorithms to Deep Learning architectures and we report the corresponding results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Russel, S. J., & Norvig, P. (2016). Artificial intelligence: a modern approach. Pearson Education Limited.
Devillers, L., & Vidrascu, L. (2006). Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In Ninth international conference on spoken language processing. ISCA.
Lee, C. C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53, 1162–1171.
Polzehl, T., Sundaram, S., Ketabdar, H., Wagner, M., & Metze, F. (2009). Emotion classification in children’s speech using fusion of acoustic and linguistic features. In Tenth annual conference of the international speech communication association. ISCA.
Hibbeln, M., Jenkins, J. L., Schneider, C., Valacich, J. S., & Weinmann, M. (2017). How is your user feeling? Inferring emotion through human–computer interaction devices. Group, 1000, 248.
Kostoulas, T., Mporas, I., Kocsis, O., Ganchev, T., Katsaounos, N., Santamaria, J. J., Jimenez-Murcia, S., Fernandez-Aranda, F., & Fakotakis, N. (2012). Affective speech interface in serious games for supporting therapy of mental disorders. Expert Systems with Applications, 39, 11072–11079.
Tyagi, R., & Agarwal, A. (2018). Emotion detection using speech analysis. Science, 3, 18–20.
Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 39–58.
Zhang, Y., Gravina, R., Lu, H., Villari, M., & Fortino, G. (2018). PEA: Parallel electrocardiogram-based authentication for smart healthcare systems. Journal of Network and Computer Applications, 117, 10–16.
Iliadis, L. S., Maglogiannis, I., Papadopoulos, H., Karatzas, N., & Sioutas, S. (2012). Artificial intelligence applications and innovations: AIAI 2012 international workshops: AIAB, AIeIA, CISE, COPA, IIVC, ISQL, MHDW, and WADTMB. Springer.
Akbulut, F. P., & Akan, A. (2018). A smart wearable system for short-term cardiovascular risk assessment with emotional dynamics. Measurement, 128, 237–246.
Doukas, C., & Maglogiannis, I. (2008). Intelligent pervasive healthcare systems. Advanced Computational Intelligence Paradigms in Healthcare, 3, 95–115.
Bou-Ghazale, S. E., & Hansen, J. H. (2000). A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Transactions on Speech and Audio Processing, 8, 429–442.
O’Leary, A. (1992). Self-efficacy and health: Behavioral and stress-physiological mediation. Cognitive Therapy and Research, 16, 229–245.
Broyles, D., Crichton, R., Jolliffe, B., & Dixon, B. E. (2016). Shared longitudinal health records for clinical and population health. Health Information Exchange, 2016, 149–162.
Yamin, C. K., Emani, S., Williams, D. H., Lipsitz, S. R., Karson, A. S., Wald, J. S., & Bates, D. W. (2011). The digital divide in adoption and use of a personal health record. Archives of Internal Medicine, 171, 568–574.
Doukas, C., & Maglogiannis, I. (2011). Managing wearable sensor data through cloud computing. In IEEE third international conference on cloud computing technology and science (CloudCom). IEEE.
Sloman, A. (1999). Review of affective computing. AI Magazine, 20, 127.
Alfano, C. A., Bower, J., Cowie, J., Lau, S., & Simpson, R. J. (2017). Long-duration space exploration and emotional health: Recommendations for conceptualizing and evaluating risk. Acta Astronautica, 142, 289–299.
Fridlund, A. J. (2014). Human facial expression: An evolutionary view. Academic Press.
Caridakis, G., Karpouzis, K., Wallace, M., Kessous, L., & Amir, N. (2010). Multimodal user’s affective state analysis in naturalistic interaction. Journal on Multimodal User Interfaces, 3, 49–66.
Doukas, C., & Maglogiannis, I. (2008). Enabling human status awareness in assistive environments based on advanced sound and motion data classification. In Proceedings of the 1st international conference on PErvasive technologies related to assistive environments. ACM.
Maglogiannis, I. G., Karpouzis, K., & Wallace, M. (2005). Image and signal processing for networked e-health applications. Synthesis Lectures on Biomedical Engineering, 1, 1–108.
Nicholson, J., Takahashi, K., & Nakatsu, R. (2000). Emotion recognition in speech using neural networks. Neural Computing & Applications, 9, 290–296.
Seidel, E. M., Habel, U., Kirschner, M., Gur, R. C., & Derntl, B. (2010). The impact of facial emotional expressions on behavioral tendencies in women and men. Journal of Experimental Psychology: Human Perception and Performance, 36, 500.
Schultz, D. P., & Schultz, S. E. (2016). Theories of personality. Cengage Learning, 29, 516.
Picard, R. W. (1995). Affective computing. M.I.T Media Laboratory Perceptual Computing Section Technical Report No. 321.
Simon, H. A. (1979). Models of thought (Vol. 2). Yale University Press.
Mansoorizadeh, M., & Charkari, N. M. (2009). Hybrid feature and decision level fusion of face and speech information for bimodal emotion recognition. In IEEE computer conference CSICC. IEEE.
Bejani, M., Gharavian, D., & Charkari, N. M. (2014). Audiovisual emotion recognition using ANOVA feature selection method and multi-classifier neural networks. Neural Computing and Applications, 24, 399–412.
Busso, C., & Narayanan, S. S. (2007). Interrelation between speech and facial gestures in emotional utterances: a single subject study. IEEE Transactions on Audio, Speech and Language Processing, 15, 2331–2347.
Jürgens, R., Grass, A., Drolet, M., & Fischer, J. (2015). Effect of acting experience on emotion expression and recognition in voice: non-actors provide better stimuli than expected. Journal of Nonverbal Behavior, 39, 195–214.
Jürgens, R., Hammerschmidt, K., & Fischer, J. (2011). Authentic and play-acted vocal emotion expressions reveal acoustic differences. Frontiers in Psychology, 2, 180.
Vogt, T., & André, E. (2005). Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In IEEE international conference on multimedia and expo. IEEE.
Datatang. (2015). Chinese Academy of Sciences. Retrieved September 10, 2018, from http://www.en.datatang.com/product.php?id=28
Wang, K., An, N., Li, B. N., Zhang, Y., & Li, L. (2015). Speech emotion recognition using Fourier parameters. IEEE Transactions on Affective Computing, 6, 69–75.
Dellaert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. In Fourth international conference on spoken language processing. IEEE.
Rawat, A., & Mishra, P. K. (2015). Emotion recognition through speech using neural network. International Journal of Advanced Research in Computer Science and Software Engineering, 5, 422–428.
Hamidi, M., & Mansoorizade, M. (2012). Emotion recognition from Persian speech with neural network. International Journal of Artificial Intelligence Applications, 3, 107.
Quan, C., & Ren, F. (2016). Weighted high-order hidden Markov models for compound emotions recognition in text. Information Sciences, 329, 581–596.
Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.
Schuller, B., Rigoll, G., & Lang, M. (2003). Hidden Markov model-based speech emotion recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2, II.
Hu, H., Xu, M. X., & Wu, W. (2007). Fusion of global statistical and segmental spectral features for speech emotion recognition. In Eighth annual conference of the international speech communication association. IEEE.
Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48, 1162–1181.
Schuller, B., Lang, M., & Rigoll, G. (2005). Robust acoustic speech emotion recognition by ensembles of classifiers. Tagungsband Fortschritte der Akustik-DAGA# 05, München.
Wu, C. H., & Liang, W. B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transactions on Affective Computing, 2, 10–21.
El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44, 572–587.
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., & Taylor, J. G. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18, 32–80.
Argyle, M. (2013). The psychology of happiness. Routledge.
Teager, H. (1980). Some observations on oral air flow during phonation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 599–601.
Zhou, G., Hansen, J. H., & Kaiser, J. F. (2001). Nonlinear feature-based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 9, 201–216.
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29, 82–97.
Weninger, F., Geiger, J., Wöllmer, M., Schuller, B., & Rigoll, G. (2014). Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments. Computer Speech & Language, 28, 888–902.
Mao, Q., Dong, M., Huang, Z., & Zhan, Y. (2014). Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia, 16, 2203–2213.
Fayek, H. M., Lech, M., & Cavedon, L. (2017). Evaluating deep learning architectures for speech emotion recognition. Neural Networks, 92, 60–68.
Zhu, L., Chen, L., Zhao, D., Zhou, J., & Zhang, W. (2017). Emotion recognition from Chinese speech for smart affective services using a combination of SVM and DBN. Sensors, 17, 1694.
Sangeetha, J., & Jayasankar, T. (2019). Emotion speech recognition based on adaptive fractional deep belief network and reinforcement learning. Cognitive Informatics and Soft Computing, 2019, 165–174.
Li, L., Zhao, Y., Jiang, D., Zhang, Y., Wang, F., Gonzalez, I., Valentin, E., & Sahli, H. (2013). Hybrid deep neural network--hidden Markov model (DNN-HMM) based speech emotion recognition. In Humaine association conference on affective computing and intelligent interaction (ACII). ACII.
Uzair, M., Shafait, F., Ghanem, B., & Mian, A. (2018). Representation learning with deep extreme learning machines for efficient image set classification. Neural Computing and Applications, 30, 1211–1223.
Lim, W., Jang, D., & Lee, T. (2016). Speech emotion recognition using convolutional and recurrent neural networks. In IEEE signal and information processing association annual summit and conference. IEEE.
Mu, N., Xu, X., Zhang, X., & Zhang, H. (2018). Salient object detection using a covariance-based CNN model in low-contrast images. Neural Computing and Applications, 29, 181–192.
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. In Ninth European conference on speech communication and technology. ISCA.
Tao, J. H., Liu, F., Zhang, M., & Jia, H. B. (2008). Design of speech corpus for mandarin text to speech. In The Blizzard challenge workshop. IEEE.
Wang, K. X., Zhang, Q. L., & Liao, S. Y. (2014). A database of elderly emotional speech. In Proc. int. symp. signal process. Biomed. Eng Informat.
Vincent, E., Watanabe, S., Barker, J., & Marxer, R. (2016). The 4th CHiME speech separation and recognition challenge. CHiME.
Haq, S., Jackson, P. J., & Edge, J. (2009). Speaker-dependent audio-visual emotion recognition. AVSP, 2009, 53–58.
Engberg, I. S., Hansen, A. V., Andersen, O., & Dalsgaard, P. (1997). Design, recording and verification of a Danish emotional speech database. In Fifth european conference on speech communication and technology. ISCA.
Mao, X., & Chen, L. (2010). Speech emotion recognition based on parametric filter and fractal dimension. IEICE Transactions on Information and Systems, 93, 2324–2326.
Badshah, A. M., Ahmad, J., Rahim, N., & Baik, S. W. (2017). Speech emotion recognition from spectrograms with deep convolutional neural network. In International conference on platform technology and service. IEEE.
Chandrasekar, P., Chapaneri, S., & Jayaswal, D. (2014). Automatic speech emotion recognition: A survey. In IEEE international conference on circuits, systems, communication and information technology applications (pp. 341–346). IEEE.
Pervaiz, M., & Khan, T. A. (2016). Emotion recognition from speech using prosodic and linguistic features. Emotion, 7, 84.
O’Connor, J. D., & Arnold, G. F. (2004). Intonation of colloquial English. РГБ.
Schubiger, M. (1958). English intonation, its form and function. M. Niemeyer Verlag.
Hozjan, V., Kacic, Z., Moreno, A., Bonafonte, A., & Nogueiras, A. (2002). Interface databases: Design and collection of a multilingual emotional speech database. LREC.
Campbell, N. (2000). Databases of emotional speech. In ISCA tutorial and research workshop (ITRW) on speech and emotion. ISCA.
Lee, C. M., & Narayanan, S. S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13, 293–303.
Breazeal, C., & Aryananda, L. (2002). Recognition of affective communicative intent in robot-directed speech. Autonomous Robots, 12, 83–104.
Zhu, M., Zhang, Z., Hirdes, J. P., & Stolee, P. (2007). Using machine learning algorithms to guide rehabilitation planning for home care clients. BMC Medical Informatics and Decision Making, 7, 41.
Wells, J. L., Seabrook, J. A., Stolee, P., Borrie, M. J., & Knoefel, F. (2003). State of the art in geriatric rehabilitation. Part I: Review of frailty and comprehensive geriatric assessment. Archives of Physical Medicine and Rehabilitation, 84, 890–897.
Coleman, E. A. (2003). Falling through the cracks: Challenges and opportunities for improving transitional care for persons with continuous complex care needs. Journal of the American Geriatrics Society, 51, 549–555.
Giannakopoulos, T. (2015). Pyaudioanalysis: An open-source python library for audio signal analysis. PLoS One, 10, e0144610.
Beeke, S., Wilkinson, R., & Maxim, J. (2009). Prosody as a compensatory strategy in the conversations of people with agrammatism. Clinical Linguistics & Phonetics, 23, 133–155.
Borchert, M., & Dusterhoft, A. (2005). Emotions in speech-experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments. In IEEE International conference on natural language processing and knowledge engineering. IEEE.
Li, X., Tao, J., Johnson, M. T., Soltis, J., Savage, A., Leong, K. M., & Newman, J. D. (2007). Stress and emotion classification using jitter and shimmer features. In IEEE international conference on acoustics, speech and signal processing. IEEE.
Chollet, F. (2015). Keras: Deep learning library for theano and tensorflow. Keras.
Ekman, P. (1970). Universal facial expressions of emotions. California Mental Health Research Digest, 8(4), 151–158.
He, K., & Sun, J. (2015). Convolutional neural networks at constrained time cost. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5353–5360). IEEE.
Dupuis, K., & Pichora-Fuller, M. K. (2010). Toronto emotional speech set (TESS). University of Toronto.
Livingstone, S. R., & Russo, F. A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS One, 13(5), e0196391.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Stavrianos, P., Pavlopoulos, A., Maglogiannis, I. (2022). Enabling Speech Emotional Intelligence as a Service in Homecare Platforms. In: Husain, M.S., Adnan, M.H.B.M., Khan, M.Z., Shukla, S., Khan, F.U. (eds) Pervasive Healthcare. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-030-77746-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-77746-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77745-6
Online ISBN: 978-3-030-77746-3
eBook Packages: EngineeringEngineering (R0)