Abstract
There has been a growing focus on the use of artificial intelligence and machine learning for affective computing to further enhance user experience through emotion recognition. Typically, machine learning models used for affective computing are trained using manually extracted features from biological signals. Such features may not generalize well for large datasets. One approach to address this issue is to use fully supervised deep learning methods to learn latent representations. However, this method requires human supervision to label the data, which may be unavailable. In this work we propose an unsupervised framework for representation learning. The proposed framework utilizes two stacked convolutional autoencoders to learn latent representations from wearable electrocardiogram and electrodermal activity signals. The representations learned from this unsupervised framework are subsequently utilized within a random forest model to classify arousal. To validate this framework, an aggregation of the AMIGOS, ASCERTAIN, CLEAS, and MAHNOB-HCI datasets is created. The results of our proposed method are compared with other methods including convolutional neural networks, as well as methods that employ manual extraction of features. We show that our method outperforms current state-of-the-art results. The results show the wide-spread applicability for stacked convolutional autoencoders to be used for affective computing.
This is a preview of subscription content, access via your institution.







Data availability
The AMIGOS dataset analysed during the current study is available in the AMIGOSdataset repository, http://www.eecs.qmul.ac.uk/mmv/datasets/amigos/in-dex.html. The ASCERTAIN dataset is available in the Multimedia and Human Understanding Group repository, http://mhug.disi.unitn.it/wp-content/ASCERTAIN/as-certain.html#/. The MAHNOB-HCI dataset is available in the HCI Tagging Database repository, https://mahnob-db.eu/hci-tagging/. The CLEAS dataset is available from the corresponding author on reasonable request.
References
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado G. S, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) Tensorflow: large-scale machine learning on heterogeneous systems. Software available from tensorflow.org
Abraham B, Nair MS (2018) Computer-aided diagnosis of clinically significant prostate cancer from MRI images using sparse autoencoder and random forest classifier. Biocybern Biomed Eng 38(3):733–744
Agrafioti F, Hatzinakos D, Anderson AK (2012) ECG pattern analysis for emotion detection. IEEE Trans Affect Comput 3(1):102–115
Anderson A, Hsiao T, Metsis V (2017) Classification of emotional arousal during multimedia exposure. In: Proceedings of the 10th international conference on pervasive technologies related to assistive environments (PETRA). Association for Computing Machinery, pp 181–184
Ayata D, Yaslan Y, Kamaşak M. (2016) Emotion recognition via random forest and galvanic skin response: comparison of time based feature sets, window sizes and wavelet approaches. In: 2016 medical technologies national congress (TIPTEKNO). IEEE, pp 1–4
Ayata D, Yaslan Y, Kamaşak M (2017) Emotion recognition via galvanic skin response: comparison of machine learning algorithms and feature extraction methods. Istanb Univ J Electr Electron Eng 17(1):3147–3156
Bali JS, Nandi AV, Hiremath PS (2018) Performance comparison of ANN classifiers for sleep apnea detection based on ECG signal analysis using Hilbert transform. Int J Comput Technol 17(2):7312–7325
Bradley MM, Lang PJ (1994) Measuring emotion: the self-assessment manikin and the semantic differential. J Behav Ther Exp Psychiatry 25(1):49–59
Bradley MM, Lang PJ (2000) Affective reactions to acoustic stimuli. Psychophysiology 37(2):204–215
Braithwaite J, Watson D, Jones R, Rowe M (2013) A guide for analysing EDA & skin conductance responses for psychological experiments. Psychophysiology 49:1017–1034
Catrambone V, Greco A, Scilingo EP, Valenza G (2019) Functional linear and nonlinear brain-heart interplay during emotional video elicitation: a maximum information coefficient study. Entropy 21(9):892
Chollet F et al (2015) Keras. GitHub. Retrieved from https://github.com/fchollet/keras
Correa JAM, Abadi MK, Sebe N, Patras I (2018) Amigos: a dataset for affect, personality and mood research on individuals and groups. IEEE Trans Affect Comput 12:479–493
Das P, Khasnobish A, Tibarewala D (2016) Emotion recognition employing ECG and GSR signals as markers of ans. In: Conference on advances in signal processing (CASP). IEEE, pp 37–42
Etemad SA, Arya A (2014) Extracting movement, posture, and temporal style features from human motion. Biol Inspir Cogn Archit 7:15–25
Etemad SA, Arya A (2016) Expert-driven perceptual features for modeling style and affect in human motion. IEEE Trans Hum Mach Syst 46(4):534–545
Feng H, Golshan HM, Mahoor MH (2018) A wavelet-based approach to emotion classification using EDA signals. Expert Syst Appl 112:77–86
Fernandez R, Picard R (2005) Classical and novel discriminant features for affect recognition from speech. In: Proceedings of interspeech, pp 473–476
Gjoreski M, Gjoreski H, Luštrek M, Gams M (2017) Deep affect recognition from r–r intervals. In: Proceedings of the ACM international joint conference on pervasive and ubiquitous computing (UbiComp) and Proceedings of the ACM international symposium on wearable computers, New York, NY, USA. Association for Computing Machinery, pp 754–762
Gjoreski M, Lustrek M, Gams M, Mitrevski B (2018) An inter-domain study for arousal recognition from physiological signals. Informatica (Slovenia) 42:61–68
Gomez P, von Gunten A, Danuser B (2016) Autonomic nervous system reactivity within the valence-arousal affective space: modulation by sex and age. Int J Psychophysiol 109:51–62
Greco A, Valenza G, Citi L, Scilingo EP (2017) Arousal and valence recognition of affective sounds based on electrodermal activity. IEEE Sens J 17(3):716–725
Greco A, Faes L, Catrambone V, Barbieri R, Scilingo EP, Valenza G (2019) Lateralization of directional brain-heart information transfer during visual emotional elicitation. Am J Physiol Regul Integr Comp Physiol 317(1):R25–R38 (PMID: 31042401)
Greene S, Thapliyal H, Caban-Holt A (2016) A survey of affective computing for stress detection: evaluating technologies in stress detection for better health. IEEE Consum Electron Mag 5(4):44–56
Hassan MM, Alam MGR, Uddin MZ, Huda S, Almogren A, Fortino G (2019) Human emotion recognition using deep belief network architecture. Inf Fus 51:10–18
Healey JA, Picard RW (2005) Detecting stress during real-world driving tasks using physiological sensors. IEEE Trans Intell Transp Syst 6(2):156–166
Healey J, Picard RW et al (2005) Detecting stress during real-world driving tasks using physiological sensors. IEEE Trans Intell Transp Syst 6(2):156–166
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, volume 37 of Proceedings of machine learning research. PMLR, pp 448–456
Katsigiannis S, Ramzan N (2018) Dreamer: a database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices. IEEE J Biomed Health Inform 22(1):98–107
Kingston Health Sciences Centre (2020) KHSC Kingston Health Sciences Centre
Koelstra S, Patras I (2013) Fusion of facial expressions and EEG for implicit affective tagging. Image Vis Comput 31(2):164–174
Koelstra S, Yazdani A, Soleymani M, Mühl C, Lee J-S, Nijholt A, Pun T, Ebrahimi T, Patras I (2010) Single trial classification of EEG and peripheral physiological signals for recognition of emotions induced by music videos. In: International conference on brain informatics. Springer, pp 89–100
Koelstra S, Muhl C, Soleymani M, Lee J-S, Yazdani A, Ebrahimi T, Pun T, Nijholt A, Patras I (2011) Deap: a database for emotion analysis; using physiological signals. IEEE Trans Affect Comput 3(1):18–31
Kramer MA (1991) Nonlinear principal component analysis using autoassociative neural networks. AIChE J 37(2):233–243
Laerdal Medical (2019) Simman 3g
Lamkin P (2018) Smart wearables market to double by 2022: \$27 billion industry forecast
Li B, Sano A (2020) Extraction and interpretation of deep autoencoder-based temporal features from wearables for forecasting personalized mood, health, and stress. Proc ACM Interact Mob Wearable Ubiquitous Technol 4:1–26
Liu W, Zheng W-L, Lu B-L (2016) Emotion recognition using multimodal deep learning. In: International conference on neural information processing. Springer, pp 521–529
Liu G, Bao H, Han B (2018) A stacked autoencoder-based deep neural network for achieving gearbox fault diagnosis. Math Probl Eng 2018:1–10
Lomb NR (1976) Least-squares frequency analysis of unequally spaced data. Astrophys Space Sci 39(2):447–462
Murugappan M, Ramachandran N, Sazali Y (2010) Classification of human emotion from EEG using discrete wavelet transform. J Biomed Sci Eng 334054:390–396
Ma S, Chen M, Wu J, Wang Y, Jia B, Jiang Y (2018) High-voltage circuit breaker fault diagnosis using a hybrid feature transformation approach based on random forest and stacked autoencoder. IEEE Trans Ind Electron 66(12):9777–9788
Macary M, Lebourdais M, Tahon M, Estève Y, Rousseau A (2020) Multi-corpus experiment on continuous speech emotion recognition: convolution or recurrence? In: Speech and computer. Springer International Publishing, pp 304–314
Malik M (1996) Heart rate variability. standards of measurement, physiological interpretation, and clinical use: task force of the European Society of Cardiology and the North American Society for Pacing and Electrophysiology. Ann Noninvasive Electrocardiol 1(2):151–181
Masci J, Meier U, Cireşan D, Schmidhuber J (2011) Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela T, Duch W, Girolami M, Kaski S (eds) Artificial neural networks and machine learning—ICANN. Springer, Berlin, pp 52–59
Mehrabian A, Russell JA (1974) An approach to environmental psychology. The MIT Press, Cambridge
Microsoft (2019) Microsoft hololens—mixed reality technology for business. Accessed 4 Apr 2018
Murugappan M (2011) Electromyogram signal based human emotion classification using KNN and LDA. In: IEEE international conference on system engineering and technology. IEEE, pp 106–110
Ng AY (2004) Feature selection, l1 vs. l2 regularization, and rotational invariance. In: Proceedings of the twenty-first international conference on machine learning, ICML, New York, NY, USA. Association for Computing Machinery, p 78
Ng A et al (2011) Sparse autoencoder. CS294A Lecture Notes 72:1–19 (2011)
Pan J, Tompkins WJ (1985) A real-time GRS detection algorithm. IEEE Trans Biomed Eng 32(3):230–236
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Picard RW (2000) Affective computing. MIT Press, Cambridge
Picard RW, Healey J (1997) Affective wearables. In: Digest of papers. First international symposium on wearable computers, pp 90–97
Picard RW, Vyzas E, Healey J (2001) Toward machine emotional intelligence: analysis of affective physiological state. IEEE Trans Pattern Anal Mach Intell 23(10):1175–1191
Plataniotis K, Hatzinakos D, Lee J (2006) ECG biometric recognition without fiducial detection. In: Proceedings of biometrics symposiums (BSYM), pp 1–6
Roccas S, Sagiv L, Schwartz SH, Knafo A (2002) The big five personality factors and personal values. Person Soc Psychol Bull 28(6):789–801
Ross K, Sarkar P, Rodenburg D, Ruberto A, Hungler P, Szulewski A, Howes D, Etemad A (2019) Toward dynamically adaptive simulation: multimodal classification of user expertise using wearable devices. Sensors 19:4270
Rozgić V, Vitaladevuni SN, Prasad R (2013) Robust EEG emotion classification using segment level decision fusion. In: IEEE international conference on acoustics, speech and signal processing. IEEE, pp 1286–1290
Russell J (1980) A circumplex model of affect. J Person Soc Psychol 39:1161–1178
Russell J, Pratt G (1980) A description of the affective quality attributed to environments. J Person Soc Psychol 38:311–322
Russey C (2018) Wearables market to grow to \$27 billion with 137 million units sold in 2022
Santamaria-Granados L, Munoz-Organero M, Ramirez-González G, Abdulhay E, Arunkumar N (2019) Using deep convolutional neural network for emotion detection on a physiological signals dataset (amigos). IEEE Access 7:57–67
Sarkar P, Etemad A (2020a) Self-supervised ECG representation learning for emotion recognition. IEEE Trans Affect Comput. https://doi.org/10.1109/TAFFC.2020.3014842
Sarkar P, Etemad A (2020b) Self-supervised learning for ECG-based emotion recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 3217–3221
Sarkar P, Ross K, Ruberto AJ, Rodenbura D, Hungler P, Etemad A (2019) Classification of cognitive load and expertise for adaptive simulation using deep multitask learning. In: 2019 8th international conference on affective computing and intelligent interaction (ACII), pp 1–7
Sepas-Moghaddam A, Etemad A, Correia PL, Pereira F (2019) A deep framework for facial emotion recognition using light field images. In: 8th international conference on affective computing and intelligent interaction (ACII). IEEE, pp 1–7
Sepas-Moghaddam A, Etemad A, Pereira F, Correia PL (2020) Facial emotion recognition using light field images with deep attention-based bidirectional LSTM. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3367–3371
Shami M, Verhelst W (2007) Automatic classification of expressiveness in speech: a multi-corpus study. Springer, Berlin, pp 43–56
Shimmer (2021) Individual sensors. Retrieved 24 Mar 2019, from https://www.shimmersensing.com/products/individual-sensors/
Shin J, Maeng J, Kim D (2018) Inner emotion recognition using multi bio-signals. In: IEEE international conference on consumer electronics—Asia (ICCE-Asia), pp 206–212
Siddharth S, Jung T-P, Sejnowski TJ (2019) Utilizing deep learning towards multi-modal bio-sensing and vision-based affective computing. IEEE Trans Affect Comput. https://doi.org/10.1109/TAFFC.2019.2916015
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Soleymani M, Lichtenauer J, Pun T, Pantic M (2012) A multimodal database for affect recognition and implicit tagging. IEEE Trans Affect Comput 3(1):42–55
Son M, Moon J, Jung S, Hwang E (2018).A short-term load forecasting scheme based on auto-encoder and random forest. In: International conference on applied physics, system science and computers. Springer, pp 138–144
Sperlich B, Holmberg H-C (2017) Wearable, yes, but able...?: it is time for evidence-based marketing claims!
Subramanian R, Wache J, Abadi MK, Vieriu RL, Winkler S, Sebe N (2016) Ascertain: emotion and personality recognition using commercial sensors. IEEE Trans Affect Comput 9(2):147–160
Swain PH, Hauska H (1977) The decision tree classifier: design and potential. IEEE Trans Geosci Electron 15(3):142–147
Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. Algorithms and applications, data classification, p 37
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288
Tong T, Gray K, Gao Q, Chen L, Rueckert D (2017) Multi-modal classification of Alzheimer’s disease using nonlinear graph fusion. Pattern Recognit 63:171–181
Tzirakis P, Zhang J, Schuller BW (2018) End-to-end speech emotion recognition using deep neural networks. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5089–5093
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605
Wan-Hui W, Yu-Hui Q, Guang-Yuan L (2009) Electrocardiography recording, feature extraction and classification for emotion recognition. In WRI World congress on computer science and information engineering. IEEE, vol 4, pp 168–172
Wang X-W, Nie D, Lu B-L (2011) EEG-based emotion recognition using frequency domain features and support vector machines. In: International conference on neural information processing. Springer, pp 734–743
Waxenbaum JA, Reddy V, Varacallo M (2021) Anatomy, autonomic nervous system. StatPearls [Internet]. StatPearls Publishing
Wiem MBH, Lachiri Z (2017) Emotion classification in arousal valence model using mahnob-HCI database. Int J Adv Comput Sci Appl 8(3):318–323
Yang H, Lee C (2019) An attribute-invariant variational learning for emotion recognition using physiology. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1184–1188
Yang S, Yang G (2011) Emotion recognition of EMG based on improved LM BP neural network and SVM. JSW 6(8):1529–1536
Zhang G, Etemad A (2019) Capsule attention for multimodal EEG and EOG spatiotemporal representation learning with application to driver vigilance estimation. arXiv:1912.07812
Zhang B, Provost EM, Essl G (2019) Cross-corpus acoustic emotion recognition with multi-task learning: seeking common ground while preserving differences. IEEE Trans Affect Comput 10(1):85–99
Zhao C, Wan X, Zhao G, Cui B, Liu W, Qi B (2017) Spectral-spatial classification of hyperspectral imagery based on stacked sparse autoencoder and random forest. Eur J Eemote Sens 50(1):47–63
Zhao S, Ding G, Han J, Gao Y (2018) Personality-aware personalized emotion recognition from physiological signals. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence (IJCAI). International joint conferences on artificial intelligence organization, pp 1660–1667
Acknowledgements
The authors would like to acknowledge the Canadian Department of National Defense who partially funded this work. I also want to thank Pritam Sarkar, Dr. Dirk Rodenburg, Dr. Aaron Ruberto, Dr. Adam Szulewski, and Dr. Daniel Howes for their collaborations thorough this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ross, K., Hungler, P. & Etemad, A. Unsupervised multi-modal representation learning for affective computing with multi-corpus wearable data. J Ambient Intell Human Comput 14, 3199–3224 (2023). https://doi.org/10.1007/s12652-021-03462-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-03462-9