Skip to main content

Unsupervised multi-modal representation learning for affective computing with multi-corpus wearable data


There has been a growing focus on the use of artificial intelligence and machine learning for affective computing to further enhance user experience through emotion recognition. Typically, machine learning models used for affective computing are trained using manually extracted features from biological signals. Such features may not generalize well for large datasets. One approach to address this issue is to use fully supervised deep learning methods to learn latent representations. However, this method requires human supervision to label the data, which may be unavailable. In this work we propose an unsupervised framework for representation learning. The proposed framework utilizes two stacked convolutional autoencoders to learn latent representations from wearable electrocardiogram and electrodermal activity signals. The representations learned from this unsupervised framework are subsequently utilized within a random forest model to classify arousal. To validate this framework, an aggregation of the AMIGOS, ASCERTAIN, CLEAS, and MAHNOB-HCI datasets is created. The results of our proposed method are compared with other methods including convolutional neural networks, as well as methods that employ manual extraction of features. We show that our method outperforms current state-of-the-art results. The results show the wide-spread applicability for stacked convolutional autoencoders to be used for affective computing.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Data availability

The AMIGOS dataset analysed during the current study is available in the AMIGOSdataset repository, The ASCERTAIN dataset is available in the Multimedia and Human Understanding Group repository, The MAHNOB-HCI dataset is available in the HCI Tagging Database repository, The CLEAS dataset is available from the corresponding author on reasonable request.


  • Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado G. S, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) Tensorflow: large-scale machine learning on heterogeneous systems. Software available from

  • Abraham B, Nair MS (2018) Computer-aided diagnosis of clinically significant prostate cancer from MRI images using sparse autoencoder and random forest classifier. Biocybern Biomed Eng 38(3):733–744

    Google Scholar 

  • Agrafioti F, Hatzinakos D, Anderson AK (2012) ECG pattern analysis for emotion detection. IEEE Trans Affect Comput 3(1):102–115

    Google Scholar 

  • Anderson A, Hsiao T, Metsis V (2017) Classification of emotional arousal during multimedia exposure. In: Proceedings of the 10th international conference on pervasive technologies related to assistive environments (PETRA). Association for Computing Machinery, pp 181–184

  • Ayata D, Yaslan Y, Kamaşak M. (2016) Emotion recognition via random forest and galvanic skin response: comparison of time based feature sets, window sizes and wavelet approaches. In: 2016 medical technologies national congress (TIPTEKNO). IEEE, pp 1–4

  • Ayata D, Yaslan Y, Kamaşak M (2017) Emotion recognition via galvanic skin response: comparison of machine learning algorithms and feature extraction methods. Istanb Univ J Electr Electron Eng 17(1):3147–3156

    Google Scholar 

  • Bali JS, Nandi AV, Hiremath PS (2018) Performance comparison of ANN classifiers for sleep apnea detection based on ECG signal analysis using Hilbert transform. Int J Comput Technol 17(2):7312–7325

    Google Scholar 

  • Bradley MM, Lang PJ (1994) Measuring emotion: the self-assessment manikin and the semantic differential. J Behav Ther Exp Psychiatry 25(1):49–59

    Google Scholar 

  • Bradley MM, Lang PJ (2000) Affective reactions to acoustic stimuli. Psychophysiology 37(2):204–215

    Google Scholar 

  • Braithwaite J, Watson D, Jones R, Rowe M (2013) A guide for analysing EDA & skin conductance responses for psychological experiments. Psychophysiology 49:1017–1034

    Google Scholar 

  • Catrambone V, Greco A, Scilingo EP, Valenza G (2019) Functional linear and nonlinear brain-heart interplay during emotional video elicitation: a maximum information coefficient study. Entropy 21(9):892

    Google Scholar 

  • Chollet F et al (2015) Keras. GitHub. Retrieved from

  • Correa JAM, Abadi MK, Sebe N, Patras I (2018) Amigos: a dataset for affect, personality and mood research on individuals and groups. IEEE Trans Affect Comput 12:479–493

    Google Scholar 

  • Das P, Khasnobish A, Tibarewala D (2016) Emotion recognition employing ECG and GSR signals as markers of ans. In: Conference on advances in signal processing (CASP). IEEE, pp 37–42

  • Etemad SA, Arya A (2014) Extracting movement, posture, and temporal style features from human motion. Biol Inspir Cogn Archit 7:15–25

    Google Scholar 

  • Etemad SA, Arya A (2016) Expert-driven perceptual features for modeling style and affect in human motion. IEEE Trans Hum Mach Syst 46(4):534–545

    Google Scholar 

  • Feng H, Golshan HM, Mahoor MH (2018) A wavelet-based approach to emotion classification using EDA signals. Expert Syst Appl 112:77–86

    Google Scholar 

  • Fernandez R, Picard R (2005) Classical and novel discriminant features for affect recognition from speech. In: Proceedings of interspeech, pp 473–476

  • Gjoreski M, Gjoreski H, Luštrek M, Gams M (2017) Deep affect recognition from r–r intervals. In: Proceedings of the ACM international joint conference on pervasive and ubiquitous computing (UbiComp) and Proceedings of the ACM international symposium on wearable computers, New York, NY, USA. Association for Computing Machinery, pp 754–762

  • Gjoreski M, Lustrek M, Gams M, Mitrevski B (2018) An inter-domain study for arousal recognition from physiological signals. Informatica (Slovenia) 42:61–68

    Google Scholar 

  • Gomez P, von Gunten A, Danuser B (2016) Autonomic nervous system reactivity within the valence-arousal affective space: modulation by sex and age. Int J Psychophysiol 109:51–62

    Google Scholar 

  • Greco A, Valenza G, Citi L, Scilingo EP (2017) Arousal and valence recognition of affective sounds based on electrodermal activity. IEEE Sens J 17(3):716–725

    Google Scholar 

  • Greco A, Faes L, Catrambone V, Barbieri R, Scilingo EP, Valenza G (2019) Lateralization of directional brain-heart information transfer during visual emotional elicitation. Am J Physiol Regul Integr Comp Physiol 317(1):R25–R38 (PMID: 31042401)

    Google Scholar 

  • Greene S, Thapliyal H, Caban-Holt A (2016) A survey of affective computing for stress detection: evaluating technologies in stress detection for better health. IEEE Consum Electron Mag 5(4):44–56

    Google Scholar 

  • Hassan MM, Alam MGR, Uddin MZ, Huda S, Almogren A, Fortino G (2019) Human emotion recognition using deep belief network architecture. Inf Fus 51:10–18

    Google Scholar 

  • Healey JA, Picard RW (2005) Detecting stress during real-world driving tasks using physiological sensors. IEEE Trans Intell Transp Syst 6(2):156–166

    Google Scholar 

  • Healey J, Picard RW et al (2005) Detecting stress during real-world driving tasks using physiological sensors. IEEE Trans Intell Transp Syst 6(2):156–166

    Google Scholar 

  • Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, volume 37 of Proceedings of machine learning research. PMLR, pp 448–456

  • Katsigiannis S, Ramzan N (2018) Dreamer: a database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices. IEEE J Biomed Health Inform 22(1):98–107

    Google Scholar 

  • Kingston Health Sciences Centre (2020) KHSC Kingston Health Sciences Centre

  • Koelstra S, Patras I (2013) Fusion of facial expressions and EEG for implicit affective tagging. Image Vis Comput 31(2):164–174

    Google Scholar 

  • Koelstra S, Yazdani A, Soleymani M, Mühl C, Lee J-S, Nijholt A, Pun T, Ebrahimi T, Patras I (2010) Single trial classification of EEG and peripheral physiological signals for recognition of emotions induced by music videos. In: International conference on brain informatics. Springer, pp 89–100

  • Koelstra S, Muhl C, Soleymani M, Lee J-S, Yazdani A, Ebrahimi T, Pun T, Nijholt A, Patras I (2011) Deap: a database for emotion analysis; using physiological signals. IEEE Trans Affect Comput 3(1):18–31

    Google Scholar 

  • Kramer MA (1991) Nonlinear principal component analysis using autoassociative neural networks. AIChE J 37(2):233–243

    Google Scholar 

  • Laerdal Medical (2019) Simman 3g

  • Lamkin P (2018) Smart wearables market to double by 2022: \$27 billion industry forecast

  • Li B, Sano A (2020) Extraction and interpretation of deep autoencoder-based temporal features from wearables for forecasting personalized mood, health, and stress. Proc ACM Interact Mob Wearable Ubiquitous Technol 4:1–26

    Google Scholar 

  • Liu W, Zheng W-L, Lu B-L (2016) Emotion recognition using multimodal deep learning. In: International conference on neural information processing. Springer, pp 521–529

  • Liu G, Bao H, Han B (2018) A stacked autoencoder-based deep neural network for achieving gearbox fault diagnosis. Math Probl Eng 2018:1–10

    Google Scholar 

  • Lomb NR (1976) Least-squares frequency analysis of unequally spaced data. Astrophys Space Sci 39(2):447–462

    Google Scholar 

  • Murugappan M, Ramachandran N, Sazali Y (2010) Classification of human emotion from EEG using discrete wavelet transform. J Biomed Sci Eng 334054:390–396

    Google Scholar 

  • Ma S, Chen M, Wu J, Wang Y, Jia B, Jiang Y (2018) High-voltage circuit breaker fault diagnosis using a hybrid feature transformation approach based on random forest and stacked autoencoder. IEEE Trans Ind Electron 66(12):9777–9788

    Google Scholar 

  • Macary M, Lebourdais M, Tahon M, Estève Y, Rousseau A (2020) Multi-corpus experiment on continuous speech emotion recognition: convolution or recurrence? In: Speech and computer. Springer International Publishing, pp 304–314

  • Malik M (1996) Heart rate variability. standards of measurement, physiological interpretation, and clinical use: task force of the European Society of Cardiology and the North American Society for Pacing and Electrophysiology. Ann Noninvasive Electrocardiol 1(2):151–181

  • Masci J, Meier U, Cireşan D, Schmidhuber J (2011) Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela T, Duch W, Girolami M, Kaski S (eds) Artificial neural networks and machine learning—ICANN. Springer, Berlin, pp 52–59

    Google Scholar 

  • Mehrabian A, Russell JA (1974) An approach to environmental psychology. The MIT Press, Cambridge

    Google Scholar 

  • Microsoft (2019) Microsoft hololens—mixed reality technology for business. Accessed 4 Apr 2018

  • Murugappan M (2011) Electromyogram signal based human emotion classification using KNN and LDA. In: IEEE international conference on system engineering and technology. IEEE, pp 106–110

  • Ng AY (2004) Feature selection, l1 vs. l2 regularization, and rotational invariance. In: Proceedings of the twenty-first international conference on machine learning, ICML, New York, NY, USA. Association for Computing Machinery, p 78

  • Ng A et al (2011) Sparse autoencoder. CS294A Lecture Notes 72:1–19 (2011)

  • Pan J, Tompkins WJ (1985) A real-time GRS detection algorithm. IEEE Trans Biomed Eng 32(3):230–236

    Google Scholar 

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  • Picard RW (2000) Affective computing. MIT Press, Cambridge

    Google Scholar 

  • Picard RW, Healey J (1997) Affective wearables. In: Digest of papers. First international symposium on wearable computers, pp 90–97

  • Picard RW, Vyzas E, Healey J (2001) Toward machine emotional intelligence: analysis of affective physiological state. IEEE Trans Pattern Anal Mach Intell 23(10):1175–1191

    Google Scholar 

  • Plataniotis K, Hatzinakos D, Lee J (2006) ECG biometric recognition without fiducial detection. In: Proceedings of biometrics symposiums (BSYM), pp 1–6

  • Roccas S, Sagiv L, Schwartz SH, Knafo A (2002) The big five personality factors and personal values. Person Soc Psychol Bull 28(6):789–801

    Google Scholar 

  • Ross K, Sarkar P, Rodenburg D, Ruberto A, Hungler P, Szulewski A, Howes D, Etemad A (2019) Toward dynamically adaptive simulation: multimodal classification of user expertise using wearable devices. Sensors 19:4270

    Google Scholar 

  • Rozgić V, Vitaladevuni SN, Prasad R (2013) Robust EEG emotion classification using segment level decision fusion. In: IEEE international conference on acoustics, speech and signal processing. IEEE, pp 1286–1290

  • Russell J (1980) A circumplex model of affect. J Person Soc Psychol 39:1161–1178

    Google Scholar 

  • Russell J, Pratt G (1980) A description of the affective quality attributed to environments. J Person Soc Psychol 38:311–322

    Google Scholar 

  • Russey C (2018) Wearables market to grow to \$27 billion with 137 million units sold in 2022

  • Santamaria-Granados L, Munoz-Organero M, Ramirez-González G, Abdulhay E, Arunkumar N (2019) Using deep convolutional neural network for emotion detection on a physiological signals dataset (amigos). IEEE Access 7:57–67

    Google Scholar 

  • Sarkar P, Etemad A (2020a) Self-supervised ECG representation learning for emotion recognition. IEEE Trans Affect Comput.

    Article  Google Scholar 

  • Sarkar P, Etemad A (2020b) Self-supervised learning for ECG-based emotion recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 3217–3221

  • Sarkar P, Ross K, Ruberto AJ, Rodenbura D, Hungler P, Etemad A (2019) Classification of cognitive load and expertise for adaptive simulation using deep multitask learning. In: 2019 8th international conference on affective computing and intelligent interaction (ACII), pp 1–7

  • Sepas-Moghaddam A, Etemad A, Correia PL, Pereira F (2019) A deep framework for facial emotion recognition using light field images. In: 8th international conference on affective computing and intelligent interaction (ACII). IEEE, pp 1–7

  • Sepas-Moghaddam A, Etemad A, Pereira F, Correia PL (2020) Facial emotion recognition using light field images with deep attention-based bidirectional LSTM. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3367–3371

  • Shami M, Verhelst W (2007) Automatic classification of expressiveness in speech: a multi-corpus study. Springer, Berlin, pp 43–56

    Google Scholar 

  • Shimmer (2021) Individual sensors. Retrieved 24 Mar 2019, from

  • Shin J, Maeng J, Kim D (2018) Inner emotion recognition using multi bio-signals. In: IEEE international conference on consumer electronics—Asia (ICCE-Asia), pp 206–212

  • Siddharth S, Jung T-P, Sejnowski TJ (2019) Utilizing deep learning towards multi-modal bio-sensing and vision-based affective computing. IEEE Trans Affect Comput.

    Article  Google Scholar 

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  • Soleymani M, Lichtenauer J, Pun T, Pantic M (2012) A multimodal database for affect recognition and implicit tagging. IEEE Trans Affect Comput 3(1):42–55

    Google Scholar 

  • Son M, Moon J, Jung S, Hwang E (2018).A short-term load forecasting scheme based on auto-encoder and random forest. In: International conference on applied physics, system science and computers. Springer, pp 138–144

  • Sperlich B, Holmberg H-C (2017) Wearable, yes, but able...?: it is time for evidence-based marketing claims!

  • Subramanian R, Wache J, Abadi MK, Vieriu RL, Winkler S, Sebe N (2016) Ascertain: emotion and personality recognition using commercial sensors. IEEE Trans Affect Comput 9(2):147–160

    Google Scholar 

  • Swain PH, Hauska H (1977) The decision tree classifier: design and potential. IEEE Trans Geosci Electron 15(3):142–147

    Google Scholar 

  • Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. Algorithms and applications, data classification, p 37

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  • Tong T, Gray K, Gao Q, Chen L, Rueckert D (2017) Multi-modal classification of Alzheimer’s disease using nonlinear graph fusion. Pattern Recognit 63:171–181

    Google Scholar 

  • Tzirakis P, Zhang J, Schuller BW (2018) End-to-end speech emotion recognition using deep neural networks. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5089–5093

  • Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  • Wan-Hui W, Yu-Hui Q, Guang-Yuan L (2009) Electrocardiography recording, feature extraction and classification for emotion recognition. In WRI World congress on computer science and information engineering. IEEE, vol 4, pp 168–172

  • Wang X-W, Nie D, Lu B-L (2011) EEG-based emotion recognition using frequency domain features and support vector machines. In: International conference on neural information processing. Springer, pp 734–743

  • Waxenbaum JA, Reddy V, Varacallo M (2021) Anatomy, autonomic nervous system. StatPearls [Internet]. StatPearls Publishing

  • Wiem MBH, Lachiri Z (2017) Emotion classification in arousal valence model using mahnob-HCI database. Int J Adv Comput Sci Appl 8(3):318–323

    Google Scholar 

  • Yang H, Lee C (2019) An attribute-invariant variational learning for emotion recognition using physiology. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1184–1188

  • Yang S, Yang G (2011) Emotion recognition of EMG based on improved LM BP neural network and SVM. JSW 6(8):1529–1536

    Google Scholar 

  • Zhang G, Etemad A (2019) Capsule attention for multimodal EEG and EOG spatiotemporal representation learning with application to driver vigilance estimation. arXiv:1912.07812

  • Zhang B, Provost EM, Essl G (2019) Cross-corpus acoustic emotion recognition with multi-task learning: seeking common ground while preserving differences. IEEE Trans Affect Comput 10(1):85–99

    Google Scholar 

  • Zhao C, Wan X, Zhao G, Cui B, Liu W, Qi B (2017) Spectral-spatial classification of hyperspectral imagery based on stacked sparse autoencoder and random forest. Eur J Eemote Sens 50(1):47–63

    Google Scholar 

  • Zhao S, Ding G, Han J, Gao Y (2018) Personality-aware personalized emotion recognition from physiological signals. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence (IJCAI). International joint conferences on artificial intelligence organization, pp 1660–1667

Download references


The authors would like to acknowledge the Canadian Department of National Defense who partially funded this work. I also want to thank Pritam Sarkar, Dr. Dirk Rodenburg, Dr. Aaron Ruberto, Dr. Adam Szulewski, and Dr. Daniel Howes for their collaborations thorough this study.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Kyle Ross.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ross, K., Hungler, P. & Etemad, A. Unsupervised multi-modal representation learning for affective computing with multi-corpus wearable data. J Ambient Intell Human Comput 14, 3199–3224 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: