Audiovisual Conflict Detection in Political Debates

  • Yannis PanagakisEmail author
  • Stefanos Zafeiriou
  • Maja Pantic
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8925)


In this paper, the problem of conflict detection in audiovisual recordings of political debates is investigated. In contrast to the current state of the art in social signal processing, where only the audio modality is employed for analysing the human non-verbal behavior, we propose to use additionally visual features capturing certain facial behavioral cues such as head nodding, fidgeting, and frowning which are related to conflicts. To this end, a dataset with video excerpts from televised political debates, where conflicts naturally arise, is introduced. The prediction of conflict level (i.e., conflict/nonconflict) is performed by applying the linear support vector machine and the collaborative representation-based classifier onto audio, visual, and audiovisual features. The experimental results demonstrate that the fusion of audio and visual features, outperform the accuracy in conflict detection, obtained by features that resort to a single modality (i.e., either audio or video).


Video Frame Political Debate Head Nodding Audio Feature Active Appearance Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Pantic, M., Cowie, R., D’ericco, F., Heylen, D., Mehu, M., Pelachaud, C., Poggi, I., Schroder, M., Vinciarelli, A.: Social Signal Processing: The Research Agenda. Springer (2011)Google Scholar
  2. 2.
    Vinciarelli, A., Pantic, M., Heylen, D., Pelachaud, C., Poggi, I., D”Errico, F., Schroeder, M.: Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Trans. Affective Computing 3(1), 69–87 (2012)CrossRefGoogle Scholar
  3. 3.
    Gunes, H., Pantic, M.: Automatic, dimensional and continuous emotion recognition. Int. J. Synthetic Emotion 1(2), 68–99 (2010)CrossRefGoogle Scholar
  4. 4.
    Pantic, M., Pentland, A., Nijholt, A., Huang, T.: Human-centred intelligent human-computer interaction (\(hci2\)): How far are we from attaining it? Int. J. Autonomous and Adaptive Communications Systems 1(2), 168–187 (2008)CrossRefGoogle Scholar
  5. 5.
    Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans. Pattern Analysis and Machine Intelligenc 31(1) (2009) 39–58Google Scholar
  6. 6.
    Bousmalis, K., Morency, L., Pantic, M.: Modeling hidden dynamics of multimodal cues for spontaneous agreement and disagreement recognition. In: Proc. IEEE 2011 Int. Conf. Automatic Face and Gesture Recognition, pp. 746–752 (2011)Google Scholar
  7. 7.
    Kim, S., Valente, F., Vinciarelli, A.: Automatic detection of conflicts in spoken conversations: Ratings and analysis of broadcast political debates. In: Proc. 2012 IEEE Int. Conf. Audio, Speech and Signal Processing (2012)Google Scholar
  8. 8.
    Kim, S., Yella, S.H., Valente, F.: Automatic detection of conflict escalation in spoken conversation. In: Proc. 13th Annual Conf. International Speech Communication Association (2012)Google Scholar
  9. 9.
    Jayagopi, D., Hung, H., Yeo, C., Gatica-Perez, D.: Modeling dominance in group conversations from non-verbal activity cues. IEEE Trans. Audio, Speech and Language Processing 17(3), 501–513 (2009)CrossRefGoogle Scholar
  10. 10.
    Wrede, D., Shriberg, E.: Spotting hotspots in meetings: Human judgments and prosodic cues. In: Proc. Eurospeech, pp. 2805–2808 (2003)Google Scholar
  11. 11.
    Black, M., Katsamanis, A., Lee, C.C., Lammert, A., Baucom, B., Christensen, A., Georgiou, P., Narayanan, S.: Automatic classification of married couples’ behavior using audio features. In: Proc. InterSpeech (2010)Google Scholar
  12. 12.
    Pianesi, F., Mana, N., Cappelletti, A., Lepri, B., Zancanaro, M.: Multimodal recognition of personality traits in social interactions. In: Proc. 2008 Int. Conf. Multimodal Interfaces, pp. 253–260 (2008)Google Scholar
  13. 13.
    Levine, J.M., Moreland, R.L.: Small groups. Oxford University Press (1998)Google Scholar
  14. 14.
    Bousmalis, K., Mehu, M., Pantic, M.: Towards the automatic detection of spontaneous agreement and disagreement based on non-verbal behaviour: A survey of related cues, databases, and tools. Image and Vision Computing Journal 31(2), 203–221 (2013)CrossRefGoogle Scholar
  15. 15.
    M. Galley, K. McKeown, J.H., Shriberg, E.: Identifying agreement and disagreement in conversational speech: use of bayesian networks to model pragmatic dependencies. In: Proc. Meeting Association for Computational Linguistics, pp. 669–676 (2004)Google Scholar
  16. 16.
    Germesin, S., Wilson, T.: Agreement detection in multiparty conversation. In: Proc. Int. Conf. Multimodal Interfaces, pp. 7–14 (2009)Google Scholar
  17. 17.
    Hahn, S., Ladner, R., Ostendorf, M.: Agreement/disagreement classification: Exploiting unlabeled data using contrast classifiers. In: Proc. Human Language Technology Conf. of the NAACL, pp. 53–56 (2006)Google Scholar
  18. 18.
    Nicolaou, M.A., Pavlovic, V., Pantic, M.: Dynamic probabilistic cca for analysis of affective behaviour. In: Proc. 12th European Conference on Computer Vision, Florence, Italy, pp. 98–111, October 2012Google Scholar
  19. 19.
    Cooper, V.W.: Participant and observer attribution of affect in interpersonal conflict: an examination of noncontent verbal behavior. J. Nonverbal Behavior 10(2), 134–144 (1986)Google Scholar
  20. 20.
    Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)CrossRefGoogle Scholar
  21. 21.
    Zhang, L., Yang, M., Feng, X.: Sparse representation or collaborative representation: Which helps face recognition? In: Proc. 2011 Int. Conference on Computer Vision, Washington, DC, USA, pp. 471–478 (2011)Google Scholar
  22. 22.
    Paul, B.: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In: Proc. of the Institute of Phonetic Sciences, pp. 97–110 (1993)Google Scholar
  23. 23.
    Mueller, M., Ellis, D., Klapuri, A., Richard, G.: Signal processing for music analysis. IEEE J. Sel. Topics in Sig. Process. 5(6), 1088–1110 (2011)Google Scholar
  24. 24.
    Tzimiropoulos, G., Alabort, J., Zaferiou, S., Pantic, M.: Generic active appearance models revisited. In: Proc. 11th Asian Conf. Computer Vision (2012)Google Scholar
  25. 25.
    Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Computer Vision 57(2), 137–154 (2004)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Yannis Panagakis
    • 1
    Email author
  • Stefanos Zafeiriou
    • 1
  • Maja Pantic
    • 1
    • 2
  1. 1.Department of ComputingImperial College LondonLondonUK
  2. 2.EEMCSUniversity of TwenteEnschedeThe Netherlands

Personalised recommendations