Semi-Supervised Dictionary Learning of Sparse Representations for Emotion Recognition

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8183)


This work presents a technique for the classification of emotions in human-computer interaction. Based on biophysiological data, a dictionary learning approach is used to generate sparse representations of blood volume pulse signals. Such features are then used for classification of the current emotion. Unlabeled data, i.e. data without information about class membership, is used to enrich the dictionary learning stage. Superior representation abilities of the underlying structure of the data are demonstrated by the learnt dictionaries. As a result, classification rates are improved. Experimental validation in the form of different classification experiments is presented. The results are presented with a discussion about the benefits of the approach and the existing limitations.



The presented work was developed within the Transregional Collaborative Research Centre SFB/TRR 62 “Companion-Technology for Cognitive Technical Systems” funded by the German Research Foundation (DFG). The work of Markus Kächele is supported by a scholarship of the Landesgraduiertenförderung Baden-Württemberg at Ulm University.


  1. 1.
    Aharon, M., Elad, M., Bruckstein, A.: K-svd: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)CrossRefGoogle Scholar
  2. 2.
    Altimiras, J.: Understanding autonomic sympathovagal balance from short-term heart rate variations. Are we analyzing noise? Comp. Biochem. Physiol. A: Mol. Integr. Physiol. 124, 447–460 (1999)CrossRefGoogle Scholar
  3. 3.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT: Proceedings of the Workshop on Computational Learning Theory. Morgan Kaufmann Publishers (1998)Google Scholar
  4. 4.
    Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004, Part IV. LNCS, vol. 3024, pp. 25–36. Springer, Heidelberg (2004)Google Scholar
  5. 5.
    Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of german emotional speech. In: INTERSPEECH’05, pp. 1517–1520 (2005)Google Scholar
  6. 6.
    Calderbank, R., Jafarpour, S., Schapire, R.: Compressed learning: universal sparse dimensionality reduction and learning in the measurement domain. Technical report (2009)Google Scholar
  7. 7.
    Cands, E.J., Romberg, J.K., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure. Appl. Math. 59(8), 1207–1223 (2006)CrossRefGoogle Scholar
  8. 8.
    Cohen, I., Sebe, N., Cozman, F.G., Huang, T.S.: Semi-supervised learning for facial expression recognition. In: Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval, MIR ’03, pp. 17–22. ACM (2003)Google Scholar
  9. 9.
    Dietrich, C., Schwenker, F., Palm, G.: Decision templates for the classification of bioacoustic time series. In: Proceedings of the 2002 12th IEEE Workshop on Neural Networks for Signal Processing, pp. 159–168 (2002)Google Scholar
  10. 10.
    Glodek, M., Schels, M., Palm, G., Schwenker, F.: Multiple classifier combination using reject options and markov fusion networks. In: Proceedings of the 14th ACM International Conference on Multimodal Interaction, ICMI ’12, pp. 465–472. ACM, New York (2012)Google Scholar
  11. 11.
    Glodek, M., et al.: Multiple classifier systems for the classification of audio-visual emotional states. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 359–368. Springer, Heidelberg (2011)Google Scholar
  12. 12.
    Healey, J.A.: Wearable and automotive systems for affect recognition from physiology. PhD thesis (2000)Google Scholar
  13. 13.
    Hughes, J.W., Stoney, C.M.: Depressed mood is related to high-frequency heart rate variability during stressors. Psychosom. Med. 62(6), 796–803 (2000)Google Scholar
  14. 14.
    Jafari, M., Plumbley, M.: Fast dictionary learning for sparse representations of speech signals. IEEE J. Sel. Top. Sig. Process. 5(5), 1025–1031 (2011)CrossRefGoogle Scholar
  15. 15.
    Jonghwa, K., Ande, E.: Emotion recognition based on physiological changes in music listening. IEEE Trans. Pattern Anal. Mach. Intell. 30(12), 2067–2083 (2008)CrossRefGoogle Scholar
  16. 16.
    Kächele, M., Meudt, S., Arndt, I., Schwenker, F.: Cascaded fusion of dynamic, spatial, and textural feature sets for person-independent facial emotion recognition (2013) (Submitted to ICMI 2013)Google Scholar
  17. 17.
    Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, New York (2004)CrossRefGoogle Scholar
  18. 18.
    Liu, J., Chen, C., Bu, J., You, M., Tao, J.: Speech emotion recognition using an enhanced co-training algorithm. In: IEEE International Conference on Multimedia and Expo, pp. 999–1002 (2007)Google Scholar
  19. 19.
    Lucey, P., Cohn, J., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2010), pp. 94–101 (2010)Google Scholar
  20. 20.
    Mailhe, B., Gribonval, R., Bimbot, F., Lemay, M., Vandergheynst, P., Vesin, J.M.: Dictionary learning for the sparse modelling of atrial fibrillation in ecg signals. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP ’09, pp. 465–468. IEEE Computer Society (2009)Google Scholar
  21. 21.
    Ojala, T., Pietikäinen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)CrossRefGoogle Scholar
  22. 22.
    Olshausen, B.A., Field, D.J.: Natural image statistics and efficient coding. Net. Comput. Neural Sys. 7, 333–339 (1996)CrossRefGoogle Scholar
  23. 23.
    Picard, R., Vyzas, E., Healey, J.: Toward machine emotional intelligence: analysis of affective physiological state. IEEE Trans. Pattern Anal. Mach. Intell. 23(10), 1175–1191 (2001)CrossRefGoogle Scholar
  24. 24.
    Rudnicki, M., Strumillo, P.: A real-time adaptive wavelet transform-based qrs complex detector. IEEE Trans. Biomed. Eng. 46(7), 281–289 (2007)Google Scholar
  25. 25.
    Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)CrossRefGoogle Scholar
  26. 26.
    Schachter, S.: The interaction of cognitive and physiological determinants of emotional state. In: Berkowitz, L. (ed.) Advances in Experimental Social Psychology, vol. 1, pp. 49–80+. Academic Press, New York (1964)Google Scholar
  27. 27.
    Schels, M., Kächele, M., Hrabal, D., Walter, S., Traue, H.C., Schwenker, F.: Classification of labeled and unlabeled bio-physiological data. In: Schwenker, F., Trentin, E. (eds.) PSL 2011. LNCS, vol. 7081, pp. 138–147. Springer, Heidelberg (2012)Google Scholar
  28. 28.
    Schels, M., Glodek, M., Meudt, S., Schmidt, M., Hrabal, D., Bck, R., Walter, S., Schwenker, F.: Multi-modal classifier-fusion for the classification of emotional states in WOZ scenarios. In: Proceedings of the 1st International Conference on Affective and Pleasurable Design (APD’12) [jointly with the 4th International Conference on Applied Human Factors and Ergonomics (AHFE’12)]. Advances in Human Factors and Ergonomics Series, pp. 5337–5346. CRC Press (2012)Google Scholar
  29. 29.
    Schels, M., Scherer, S., Glodek, M., Kestler, H.A., Palm, G., Schwenker, F.: On the discovery of events in eeg data utilizing information fusion. In: Computational Statistics: Special Issue: Proceedings of Statistical, Computing 2010 (2011) (online first)Google Scholar
  30. 30.
    Schels, M., Schillinger, P., Schwenker, F.: Training of multiple classifier systems utilizing partially labeled sequences. In: Proceedings of the 19th European Symposium on Artificial, Neural Networks (ESANN’11), pp. 71–76 (2011)Google Scholar
  31. 31.
    Scherer, S., Glodek, M., Schels, M., Schmidt, M., Layher, G., Schwenker, F., Neumann, H., Palm, G.: A generic framework for the inference of user states in human computer interaction: How patterns of low level communicational cues support complex affective states. In: Special Issue on Conceptual Frameworks for Multimodal Social Signal Processing, Journal on Multimodal User Interfaces (2012) (online first)Google Scholar
  32. 32.
    Scherer, S., Kane, J., Gobl, C., Schwenker, F.: Investigating fuzzy-input fuzzy-output support vector machines for robust voice quality classification. Comput. Speech Lang. 27(1), 263–287 (2013)CrossRefGoogle Scholar
  33. 33.
    Schuller, B., Valstar, M., Eyben, F., McKeown, G., Cowie, R., Pantic, M.: AVEC 2011-the first international audio/Visual emotion challenge. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 415–424. Springer, Heidelberg (2011)Google Scholar
  34. 34.
    Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., Wendemuth, A.: Acoustic emotion recognition: a benchmark comparison of performances. In: IEEE Workshop on Automatic Speech Recognition Understanding, ASRU 2009, pp. 552–557 (2009)Google Scholar
  35. 35.
    Schwenker, F., Scherer, S., Magdi, Y.M., Palm, G.: The GMM-SVM supervector approach for the recognition of the emotional status from speech. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009, Part I. LNCS, vol. 5768, pp. 894–903. Springer, Heidelberg (2009)Google Scholar
  36. 36.
    Schwenker, F., Scherer, S., Morency, L.-P. (eds.): MPRSS 2012. LNCS (LNAI), vol. 7742. Springer, Heidelberg (2013)Google Scholar
  37. 37.
    Schwenker, F., Scherer, S., Schmidt, M., Schels, M., Glodek, M.: Multiple classifier systems for the recogonition of human emotions. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 315–324. Springer, Heidelberg (2010)Google Scholar
  38. 38.
    Tosic, I., Jovanovic, I., Frossard, P., Vetterli, M., Duric, N.: Ultrasound tomography with learned dictionaries. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, pp. 5502–5505. IEEE (2010)Google Scholar
  39. 39.
    Tropp, J.A., Gilbert, A.C.: Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inform. Theory 53, 4655–4666 (2007)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Walter, S., et al.: Multimodal emotion classification in naturalistic user behavior. In: Jacko, J.A. (ed.) Human-Computer Interaction, Part III, HCII 2011. LNCS, vol. 6763, pp. 603–611. Springer, Heidelberg (2011)Google Scholar
  41. 41.
    Wipf, D., Rao, B.: \(\ell _{0}\)-norm minimization for basis selection. Adv. Neural Inf. Process. Sys. 17, 1513–1520 (2005)Google Scholar
  42. 42.
    Yarowski, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In: Proceedings Thirty-Third Meeting of the ACL, pp. 189–196 (1995)Google Scholar
  43. 43.
    Zhang, L., Tjondronegoro, D.: Facial expression recognition using facial movement features. IEEE Trans. Affect. Comput. 2(4), 219–229 (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Institute of Neural Information ProcessingUlm UniversityUlmGermany

Personalised recommendations