Advertisement

Multimedia Systems

, Volume 18, Issue 3, pp 231–250 | Cite as

Semi-supervised context adaptation: case study of audience excitement recognition

  • Elena Vildjiounaite
  • Vesa Kyllönen
  • Satu-Marja Mäkelä
  • Olli Vuorinen
  • Tommi Keränen
  • Johannes Peltola
  • Georgy Gimel’farb
Regular Paper

Abstract

To recognise just the same human reaction (for example, a strong excitement) in different contexts, customary behaviours in these contexts have to be taken into account; e.g. a happy sport audience may be cheering for long time, while a happy theatrical audience may produce only short bursts of laughter in order to not interrupt the performance. Tailoring recognition algorithms to contexts can be achieved by building either a context-specific or a generic system. The former is individually trained for each context to recognise sets of characteristic responses, whereas the latter—in contrast to the context-specific one—adapts to the context via significantly more lightweight modification of parameters. This paper follows the latter way and proposes a simple modification of a hidden Markov model (HMM) classifier that enables end users to adapt the generic system to a context or a personal perception of an annotator by labelling a fairly small number of data samples of each context. For better adaptability to the limited number of the user’s annotations, the proposed semi-supervised HMM classifier employs the maximum posterior marginal, rather than the more conventional maximum a posteriori decision rule. The proposed user- and context-adaptable semi-supervised HMM classifier was tested on recognising excitement of a show audience in three contexts (a concert hall, a circus, and a sport event), differing in how the excitement is expressed. In our experiments the proposed classifier recognised reactions of a non-neutral audience with 10% higher accuracy than the conventional HMM and support vector machine based classifiers.

Keywords

Context adaptation Audience responses Hidden Markov models Semi-supervised learning 

References

  1. 1.
    Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multime Sys 16, 345–379 (2010)CrossRefGoogle Scholar
  2. 2.
    Bishop, Ch.: Pattern recognition and machine learning. Springer, Berlin (2006)zbMATHGoogle Scholar
  3. 3.
    Brezeale, D., Cook, D.J.: Automatic video classification: a survey of the literature. IEEE Trans Sys Man Cybern Part C Appl Rev 38(3), 416–430 (2008)CrossRefGoogle Scholar
  4. 4.
    Butko, T., Pla, F., Segura, C., Nadeu, C., Hernando, J.: Two-source acoustic event detection and localisation: online implementation in a smart-room. In: Proceedings of EUSIPCO, pp. 1317–1321 (2011)Google Scholar
  5. 5.
    Cai, R., Hanjalic, A.: A flexible framework for key audio effects detection and auditory context inference. IEEE Trans Audio Speech Lang Process 14(3), 1026–1039 (2006)CrossRefGoogle Scholar
  6. 6.
    Calvo, R., D’Mello, S.: Affect detection: an interdisciplinary review of models, methods, and their applications. IEEE Trans Affect Comput 1(1), 18–37 (2010)CrossRefGoogle Scholar
  7. 7.
    Caridakis, G., Karpouzis, K., Kollias, S.: User and context adaptive neural networks for emotion recognition. Neurocomputing 71(13–15), 2553–2562 (2008)CrossRefGoogle Scholar
  8. 8.
    Carnegie Mellon University image database: http://vasc.ri.cmu.edu/idb/index.html
  9. 9.
    Cohen, I., Cozman, F.G., Sebe, N., Cirelo, M.C., Huang, T.S.: Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction. IEEE Trans Pattern Anal Mach Intell 26(12), 1553–1567 (2004)CrossRefGoogle Scholar
  10. 10.
    Dey, A.: Understanding and using context. Pers Ubiquitous Comput 5, 4–7 (2001)CrossRefGoogle Scholar
  11. 11.
    Douglas-Cowie, E., et al.: The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In: Proceedings of the ACII 2007, LNCS 4738, pp. 488–500 (2007)Google Scholar
  12. 12.
    Ellis, D.P.W., Xiaohong Zeng, McDermott, J.H.: Classifying soundtracks with audio texture features. In: Proceedings of the ICASSP 2011, pp. 5880–5883 (2011)Google Scholar
  13. 13.
  14. 14.
    Forbes-Riley, K., Litman, D.: Predicting emotion in spoken dialogue from multiple knowledge sources. In: Proceedings of the HLT/NAACL 2004, pp. 201–208 (2004)Google Scholar
  15. 15.
    Gunes, H., Pantic, M.: Automatic, dimensional and continuous emotion recognition. Int J Synth Emot 1(1), 68–99 (2010)CrossRefGoogle Scholar
  16. 16.
    Kessous, L., Castellano, G., Caridakis, G.: Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis. J Multimodal User Interfaces 3, 33–48 (2010)CrossRefGoogle Scholar
  17. 17.
    Lahti, T., Helen, M., Vuorinen, O., Väyrynen, E., Partala, J., Peltola, J., Mäkelä, S.-M.: On enabling techniques for personal audio content management. In: ACM international conference on multimedia information retrieval, pp. 113–120 (2008)Google Scholar
  18. 18.
    Laskowski, K.: Contrasting emotion-bearing laughter types in multiparticipant vocal activity detection for meetings. In: Proceedings of the ICASSP 2009, pp. 4765–4768 (2009)Google Scholar
  19. 19.
    Li, X., Ji, Q.: Active affective state detection and user assistance with dynamic Bayesian networks. IEEE Trans Sys Man Cybern Part A Sys Hum 35(1), 93–105 (2005)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Lu, L.: Content discovery from composite audio: an unsupervised approach, PhD Thesis, Delft University of Technology, http://homepage.tudelft.nl/c7c8y/Theses/PhDThesisLieLu.pdf (2009)
  21. 21.
    Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the DARPA imaging understanding workshop, Washington, DC, pp. 121–130, April 1981Google Scholar
  22. 22.
    Ma, L., Milner, B., Smith, D.: Acoustic environment classification. ACM Trans Speech Lang Process 3(2), 1–22 (2006)CrossRefGoogle Scholar
  23. 23.
    Mäkelä, S.-M., Peltola, J., Myllyniemi, M.: Mobile video capture targeted narrowband audio content classification. In: Proceedings of the ICASSP 2006, p V-525-8 (2006)Google Scholar
  24. 24.
    Nicolaou, M., Gunes, H., Pantic, M.: Continuous prediction of spontaneous affect from multiple cues and modalities in valence–arousal space. In: IEEE transactions on affective computing, special issue on affect based human behavior understanding (2011)Google Scholar
  25. 25.
  26. 26.
    Otsuka, I., Shipman, S., Divakaran, A.: A video browsing enabled personal video recorder, multimedia content analysis, pp. 1–12 (2009)Google Scholar
  27. 27.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2), 257–286 (1986)CrossRefGoogle Scholar
  28. 28.
    Radhakrishnan, R., Divakaran, A., Xiong, Z., Otsuka, I.: A content-adaptive analysis and representation framework for audio event discovery from “unscripted” multimedia. EURASIP J Appl Signal Process 2006, 1–24 (2006)zbMATHGoogle Scholar
  29. 29.
    Sano, M., Sumiyoshi, H., Shibata, M., Yagi, N.: Generating metadata from acoustic and speech data in live broadcasting. In: Proceedings of the ICASSP 2005, vol. 2, pp. 1145–1148 (2005)Google Scholar
  30. 30.
    Schuller, B., Villar, R., Rigoll, G., Lang, M.: Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition. In: Proceedings of the ICASSP05, pp. 325–328 (2005)Google Scholar
  31. 31.
    Schuller, B., Müller, R., Eyben, F., Gast, J., Hörnler, B., Wöllmer, M., Rigoll, G., Höthker, A., Konosu, H.: Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image Vis Comput 27, 1760–1774 (2009)CrossRefGoogle Scholar
  32. 32.
    Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., Rigoll, G.: Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans Affect Comput 1(2), 119–131 (2010)CrossRefGoogle Scholar
  33. 33.
    Song, M., Bu, J., Chen, Ch., Li, N.: Audio-visual based emotion recognition: a new approach. In: Proceedings of the CVPR 2004, vol. 2, pp. 1020–1025 (2004)Google Scholar
  34. 34.
    Storn, R., Price, K.: Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4), 341–359 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  35. 35.
    Tawari, A., Trievedi, M.: Speech emotion analysis: exploring the role of context. IEEE Trans Multimed 12(6), 502–509 (2010)CrossRefGoogle Scholar
  36. 36.
    Truong, K., van Leeuwen, D.: Automatic discrimination between laughter and speech. Speech Com 49, 144–158 (2007)CrossRefGoogle Scholar
  37. 37.
    Vildjiounaite, E., Kyllönen, V., Vuorinen, O., Mäkelä, S.-M., Keränen, T., Niiranen, M., Knuutinen, J., Peltola, J.: Requirements and software framework for adaptive multimodal affect recognition. In: Proceedings of the ACII 2009, pp. 1–7 (2009)Google Scholar
  38. 38.
    Vinciarelli, A., Pantic, M., Bourlard, H.: Social signal processing: survey of an emerging domain. IMAVIS 27(12), 1743–1759 (2009)Google Scholar
  39. 39.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the CVPR 2001, vol. 1, pp. I-511–I-518 (2001)Google Scholar
  40. 40.
    Vuorinen, O., Peltola, J., Mäkelä, S.-M.: Unsupervised speaker change detection for mobile device recorded speech. In: Proceedings of the ICASSP 2007, vol. 2, pp. 757–760 (2007)Google Scholar
  41. 41.
    Wagner, J., André, E., Lingenfelser, F., Kim, J.: Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Trans Affect Comput 2(4), 206–218 (2011)CrossRefGoogle Scholar
  42. 42.
    Wollmer, M., Metallinou, A., Eyben, F., Schuller, B., Narayanan, S.: Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling. In: Proceedings of the Interspeech 2010, pp. 2362–2365 (2010)Google Scholar
  43. 43.
    Wu, J., Huo, Q.: A study of minimum classification error (MCE) linear regression for supervised adaptation of MCE-trained continuous-density hidden Markov models. IEEE Trans Audio Speech Lang Process 15(2), 478–488 (2007)CrossRefGoogle Scholar
  44. 44.
    Zhu, Xingquan, Xindong, Wu, Elmagarmid, A.K., Feng, Zhe, Lide, Wu: Video data mining: semantic indexing and event detection from the association perspective. IEEE Trans Knowl Data Eng 17(5), 665–677 (2005)CrossRefGoogle Scholar
  45. 45.
    Xiong, Z., Radhakrishnan, R., Divakaran, A., Huang, T.S.: Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework. In: Proceedings of the ICASSP 2003, vol. 5, pp. 632–635 (2003)Google Scholar
  46. 46.
    Xu, M., Xu, C., Duan, L., Jin, J.S., Luo, S.: Audio keywords generation for sports video analysis. ACM Trans Multimed Comp Commun Appl 4(2), 1–23 (2008)CrossRefGoogle Scholar
  47. 47.
    Zeng, Z., Tu, J., Liu, M., Huang, T., Pianfetti, B., Roth, D., Levinson, S.: Audio-visual affect recognition. IEEE Trans Multimed 9(2), 424–428 (2007)CrossRefGoogle Scholar
  48. 48.
    Zeng, Z., Tu, J., Pianfetti, B., Huang, T.: Audio-visual affective expression recognition through multistream fused HMM. IEEE Trans Multimed 10(4), 570–577 (2008)CrossRefGoogle Scholar
  49. 49.
    Zeng, Z., Pantic, M., Roisman, G., Huang, T.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans PAMI 31(1), 39–58 (2009)CrossRefGoogle Scholar
  50. 50.
    Zhang, D., Gatica-Perez, D., Bengio, S., McCowan, I.: Semi-supervised meeting event recognition with adapted HMMs. Proc ICME 2005, 611–618 (2005)zbMATHGoogle Scholar
  51. 51.
    Zhu, G., Huang, Q., Changsheng, X., Xing, L., Gao, W., Yao, H.: Human behavior analysis for highlight ranking in broadcast racket sports video. IEEE Trans Multimed 9(6), 1167–1182 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • Elena Vildjiounaite
    • 1
  • Vesa Kyllönen
    • 1
  • Satu-Marja Mäkelä
    • 1
  • Olli Vuorinen
    • 1
  • Tommi Keränen
    • 1
  • Johannes Peltola
    • 1
  • Georgy Gimel’farb
    • 2
  1. 1.VTT Technical Research Centre of FinlandOuluFinland
  2. 2.The University of AucklandAucklandNew Zealand

Personalised recommendations