Kalman Filter Based Classifier Fusion for Affective State Recognition

  • Michael Glodek
  • Stephan Reuter
  • Martin Schels
  • Klaus Dietmayer
  • Friedhelm Schwenker
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7872)


The combination of classifier decisions is a common approach to improve classification performance [1–3]. However, non-stationary fusion of decisions is still a research topic which draws only marginal attention, although more and more classifier systems are deployed in real-time applications. Within this work, we study Kalman filters [4] as a combiner for temporally ordered classifier decisions. The Kalman filter is a linear dynamical system based on a Markov model. It is capable of combining a variable number of measurements (decisions), and can also deal with sensor failures in a unified framework. The Kalman filter is analyzed in the setting of multi-modal emotion recognition using data from the audio/visual emotional challenge 2011 [5, 6]. It is shown that the Kalman filter is well-suited for real-time non-stationary classifier fusion. Combining the available sequential uni- and multi-modal decisions does not only result in a consistent continuous stream of decisions, but also leads to significant improvements compared to the input decision performance.


Kalman Filter Emotion Recognition Observation Noise Dead Reckoning State Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Beal, M.J., Attias, H., Jojic, N.: Audio-video sensor fusion with probabilistic graphical models. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 736–750. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. 2.
    Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley (2004)Google Scholar
  3. 3.
    Ruta, D., Gabrys, B.: An overview of classifier fusion methods. Computing and Information Systems 7(1), 1–10 (2000)Google Scholar
  4. 4.
    Kalman, R.E.: A new approach to linear filtering and prediction problems. Transactions of the ASME — Journal of Basic Engineering 82(Series D), 35–45 (1960)CrossRefGoogle Scholar
  5. 5.
    Schuller, B., Valstar, M., Eyben, F., McKeown, G., Cowie, R., Pantic, M.: AVEC 2011–the first international audio/visual emotion challenge. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 415–424. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    McKeown, G., Valstar, M., Cowie, R., Pantic, M.: The SEMAINE corpus of emotionally coloured character interactions. In: Proceedings of the International Conference on Multimedia and Expo (ICME), pp. 1079–1084. IEEE (2010)Google Scholar
  7. 7.
    Glodek, M., Scherer, S., Schwenker, F.: Conditioned hidden Markov model fusion for multimodal classification. In: Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech), ISCA, pp. 2269–2272. ISCA (2011)Google Scholar
  8. 8.
    Schwenker, F., Dietrich, C.R., Thiel, C., Palm, G.: Learning of decision fusion mappings for pattern recognition. Journal on Artificial Intelligence and Machine Learning (AIML) 6, 17–22 (2006)Google Scholar
  9. 9.
    Jeon, B., Landgrebe, D.A.: Decision fusion approach for multitemporal classification. IEEE Transaction on Geoscience and Remote Sensing 37(3), 1227–1233 (1999)CrossRefGoogle Scholar
  10. 10.
    Glodek, M., Schels, M., Palm, G., Schwenker, F.: Multi-modal fusion based on classification using rejection option and Markov fusion network. In: Proceedings of the International Conference on Pattern Recognition (ICPR), pp. 1084–1087. IEEE (2012)Google Scholar
  11. 11.
    Glodek, M., Tschechne, S., Layher, G., Schels, M., Brosch, T., Scherer, S., Kächele, M., Schmidt, M., Neumann, H., Palm, G., Schwenker, F.: Multiple classifier systems for the classification of audio-visual emotional states. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 359–368. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  12. 12.
    Picard, R.: Affective computing: Challenges. International Journal of Human-Computer Studies 59(1), 55–64 (2003)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Tao, J., Tan, T.: Affective computing: A review. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 981–995. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  14. 14.
    Scherer, S., Glodek, M., Layher, G., Schels, M., Schmidt, M., Brosch, T., Tschechne, S., Schwenker, F., Neumann, H., Palm, G.: A generic framework for the inference of user states in human computer interaction: How patterns of low level communicational cues support complex affective states. Journal on Multimodal User Interfaces 6(3-4), 117–141 (2012)CrossRefGoogle Scholar
  15. 15.
    Douglas-Cowie, E., Campbell, N., Cowie, R., Roach, P.: Emotional speech: Towards a new generation of databases. Speech Communication 40(1), 33–60 (2003)zbMATHCrossRefGoogle Scholar
  16. 16.
    Frank, C., Adelhardt, J., Batliner, A., Nöth, E., Shi, R.P., Zeißler, V., Niemann, H.: The facial expression module. SmartKom: Foundations of Multimodal Dialogue Systems 1, 167–180 (2006)CrossRefGoogle Scholar
  17. 17.
    Kim, J., André, E.: Emotion recognition based on physiological changes in music listening. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2067–2083 (2008)Google Scholar
  18. 18.
    Palm, G., Glodek, M.: Towards emotion recognition in human computer interaction. In: Apolloni, B., Bassis, S., Esposito, A., Morabito, F.C. (eds.) Neural Nets and Surroundings. SIST, vol. 19, pp. 323–336. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  19. 19.
    Blackman, S., Popoli, R.: Design and Analysis of Modern Tracking Systems. Artech House Publishers (1999)Google Scholar
  20. 20.
    Bar-Shalom, Y., Li, X.R.: Estimation and Tracking: Principles, Techniques, and Software. Artech House Incorporated (1993)Google Scholar
  21. 21.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)Google Scholar
  22. 22.
    Huang, X., Acero, A., Hon, H., et al.: Spoken language processing: A Guide to Theory, Algorithm and System Development. Prentice Hall (2001)Google Scholar
  23. 23.
    Bicego, M., Murino, V., Figueiredo, M.A.T.: Similarity-based clustering of sequences using hidden Markov models. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS (LNAI), vol. 2734, pp. 86–95. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  24. 24.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)zbMATHCrossRefGoogle Scholar
  25. 25.
    Littlewort, G., Whitehill, J., Wu, T., Fasel, I., Frank, M., Movellan, J., Bartlett, M.: The computer expression recognition toolbox (CERT). In: Proceedings of the International Conference on Automatic Face & Gesture Recognition and Workshops, pp. 298–305. IEEE (2011)Google Scholar
  26. 26.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Schwenker, F., Scherer, S., Schmidt, M., Schels, M., Glodek, M.: Multiple classifier systems for the recogonition of human emotions. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 315–324. Springer, Heidelberg (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Michael Glodek
    • 1
  • Stephan Reuter
    • 2
  • Martin Schels
    • 1
  • Klaus Dietmayer
    • 2
  • Friedhelm Schwenker
    • 1
  1. 1.Institute of Neural Information ProcessingUniversity of UlmUlmGermany
  2. 2.Institute of Measurement, Control and MicrotechnologyUniversity of UlmUlmGermany

Personalised recommendations