Be at Odds? Deep and Hierarchical Neural Networks for Classification and Regression of Conflict in Speech

Brueckner, Raymond; Schuller, Björn

doi:10.1007/978-3-319-14081-0_19

Be at Odds? Deep and Hierarchical Neural Networks for Classification and Regression of Conflict in Speech

Raymond Brueckner¹⁵ &
Björn Schuller^16,15

Chapter
First Online: 01 January 2015

1648 Accesses
4 Citations

Part of the book series: Computational Social Sciences ((CSS))

Abstract

Conflict is a fundamental phenomenon inevitably arising in inter-human communication and only recently has become the subject of study in the emerging field of computational paralinguistics. As speech is a predominant carrier of information about the valence and level of conflict we investigate and demonstrate how deep and hierarchical neural networks, which have become the new mainstream paradigm in automatic speech recognition over the last few years, can be leveraged to automatically classify and predict levels of conflict purely based on audio recordings. For this purpose we adopt a neural network architecture which we previously have applied successfully to another paralinguistics task. On the Conflict Sub-Challenge data set of the Interspeech 2013 Computational Paralinguistics Challenge (ComParE) we obtained the best results reported so far in the literature on both the classification and the regression task. These results demonstrate that deep neural networks are also appropriate for the prediction of conflict levels, both for classification and regression.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
MATH Google Scholar
Boakye K, Vinyals O, Friedland G (2011) Improved overlapped speech handling for speaker diarization. In: Proceedings of interspeech, ISCA, Florence, Aug 2011, pp 941–944
Google Scholar
Bousmalis K, Mehu M, Pantic M (2009) Spotting agreement and disagreement: a survey of nonverbal audiovisual cues and tools. In: Proceedings of the 3rd international conference on affective computing and intelligent interaction and workshops, ACII 2009, vol 2. IEEE Computer Society Press, Los Alamitos
Google Scholar
Brueckner R, Schuller B (2012) Likability classification - a not so deep neural network approach. In: Proceedings of interspeech, Portland, OR, Sep 2012
Google Scholar
Brueckner R, Schuller B (2013) Hierarchical neural networks and enhanced class posteriors for social signal classification. In: Proceedings of ASRU, IEEE, Olomouc, Dec 2013, pp 361–364
Google Scholar
Brueckner R, Schuller B (2014) Social signal classification using deep BLSTM recurrent neural networks. In: Proceedings of ICASSP, IEEE, Florence, May 2014
Google Scholar
Dahl G, Sainath T, Hinton G (2013) Improving deep neural networks for LVCSR using rectified linear units and dropout. In: Proceedings of ICASSP, IEEE, Vancouver, May 2013, pp 8609–8613
Google Scholar
Erhan D, Bengio Y, Courville A, Vincent PAMP, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660
MATH MathSciNet Google Scholar
Eyben F, Wöllmer M, Schuller B (2010) openSMILE – the Munich versatile and fast open-source audio feature extractor. In: Proceedings of ACM multimedia, MM 2010, ACM, Florence, Oct 2010. ACM, New York, pp 1459–1462 (acceptance rate short paper: about 30 %)
Google Scholar
Geiger JT, Vipperla R, Bozonnet S, Evans N, Schuller B, Rigoll G (2012) Convolutive non-negative sparse coding and new features for speech overlap handling in speaker diarization. In: Proceedings of interspeech, Portland, OR, Sept 2012
Google Scholar
Geiger J, Eyben F, Schuller B, Rigoll G (2013) Detecting overlapping speech with long short-term memory recurrent neural networks. In: Proceedings of interspeech, ISCA, Lyon, Aug 2013, pp 1668–1672
Google Scholar
Gers F, Schraudolph N, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3:115–143
MathSciNet Google Scholar
Grèzes F, Richards J, Rosenberg A (2013) Let me finish: automatic conflict detection using speaker overlap. In: Proceedings of interspeech, ISCA, Lyon, Aug 2013, pp 200–204
Google Scholar
Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14(8):1771–1800
Article MATH MathSciNet Google Scholar
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MATH MathSciNet Google Scholar
Hinton G, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R (2012) Improving neural networks by preventing co-adaptation of feature detectors. CoRR. abs/1207.0580
Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780
Article Google Scholar
Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer SC, Kolen JF (eds) A field guide to dynamical recurrent neural networks. IEEE Press, New York
Google Scholar
Jaeger H (2001) The “echo state” approach to analysing and training recurrent neural networks. GMD Report 148, GMD - German National Research Institute for Computer Science
Google Scholar
Jaeger H, Maass W, Príncipe JC (2007) Special issue on echo state networks and liquid state machines. Neural Netw 20(3):287–289
Article Google Scholar
Judd CM (1978) Cognitive effects of attitude conflict resolution. J Conflict Resolut 22(3):483–498
Article Google Scholar
Kim S, Filippone M, Valente F, Vinciarelli A (2012) Predicting the conflict level in television political debates: an approach based on crowdsourcing, nonverbal communication and Gaussian processes. In: Babaguchi N, Aizawa K, Smith JR, Satoh S, Plagemann T, Hua XS, Yan R (eds) Proceedings of ACM international conference on multimedia, Nara. ACM, New York, pp 793–796
Chapter Google Scholar
Kim S, Yella SH, Valente F (2012) Automatic detection of conflict escalation in spoken conversations. In: Proceedings of interspeech, ISCA, Portland, OR, Sept 2012
Google Scholar
Levine JM, Moreland RL (1998) Small groups. In: Gilbert D, Lindzey G (eds) The handbook of social psychology, vol 2. Oxford University Press, Oxford
Google Scholar
Maas A, Hannun A, Ng A (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML workshop on deep learning for audio, speech, and language processing, WDLASL, Atlanta, GA, Jun 2013
Google Scholar
Pesarin A, Cristani M, Murino V, Vinciarelli A (2012) Conversation analysis at work: detection of conflict in competitive discussions through automatic turn-organization analysis. Cogn Process 13(2):533–540
Article Google Scholar
Räsänen O, Pohjalainen J (2013) Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech. In: Proceedings of interspeech, Lyon, Aug 2013, pp 210–214
Google Scholar
Salakhutdinov R (2009) Learning deep generative models. Ph.D. thesis, University of Toronto
Google Scholar
Schmidhuber J (1992) Learning complex extended sequences using the principle of history compression. Neural Comput 4(2):234–242
Article Google Scholar
Schuller B (2012) The computational paralinguistics challenge. IEEE Signal Process Mag 29(4):97–101
Article Google Scholar
Schuller B, Batliner A (2013) Computational paralinguistics: emotion, affect and personality in speech and language processing. Wiley, New York
Book Google Scholar
Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun 53(9/10):1062–1087 [Special Issue on Sensing Emotion and Affect – Facing Realism in Speech Processing]
Google Scholar
Schuller B, Steidl S, Batliner A, Nöth E, Vinciarelli A, Burkhardt A, van Son R, Weninger F, Eyben F, Bocklet T, Mohammadi G, Weiss B (2012) The interspeech 2012 speaker trait challenge. In: Proceedings of interspeech, Portland, OR
Google Scholar
Schuller B, Steidl S, Batliner A, Vinciarelli A, Scherer K, Ringeval F, Chetouani M Weninger F, Eyben F, Marchi E, Mortillaro M, Salamin H, Polychroniou A, Valente F, Kim S (2013) The interspeech 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of interspeech, Lyon, Aug 2013
Google Scholar
Schuster M, Paliwal K (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45:2673–2681
Article Google Scholar
Stuhlsatz A, Meyer C, Eyben F, Zielke T, Meier G, Schuller B (2011) Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: Proceedings of ICASSP, Prague, pp 5688–5691
Google Scholar
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of ICML, New York, NY, 2008, pp 1096–1103
Google Scholar
Vinciarelli A, Dielmann A, Favre S, Salamin H (2009) Canal9: a database of political debates for analysis of social interactions. In: Proceedings of the international conference on affective computing and intelligent interaction, Sept 2009, pp 1–4
Google Scholar
Vinciarelli A, Pantic M, Bourlard H (2009) Social signal processing: survey of an emerging domain. Image Vis Comput 27(12):1743–1759
Article Google Scholar
Waibel A, Hanazawa T, Hinton G, Shikano K, Lang K (1989) Phoneme recognition using time-delay neural networks. IEEE Trans Acoust Speech Signal Process 37(3):328–339
Article Google Scholar
Wang N, Melchior J, Wiskott L (2012) An analysis of Gaussian-binary restricted Boltzmann machines for natural images. In: Proceedings of ESANN, Bruges, Apr 2012, pp 287–292
Google Scholar
Wrede B, Shriberg E (2003) Spotting “hot spots” in meetings: human judgments and prosodic cues. In: Proceedings of Eurospeech, ISCA, Geneva, Sept 2003, pp 2805–2808
Google Scholar
Yamamoto K, Asano F, Yamada T, Kitawaki N (2006) Detection of overlapping speech in meetings using support vector machines and support vector regression. IEICE Trans Fundam Electron Commun Comput Sci 89-A(8):2158–2165
Article Google Scholar
Zeiler M, Ranzato M, Monga R, Mao M, Yang K, Le QV, Nguyen P, Senior A, Vanhoucke V, Dean J, Hinton G (2013) On rectified linear units for speech processing. In: ICASSP, IEEE, Vancouver, May 2013, pp 3517–3521
Google Scholar
Zelenák M, Hernando J (2011) The detection of overlapping speech with prosodic features for speaker diarization. In: Proceedings of interspeech, ISCA, Florence, Aug 2011, pp 1041–1044
Google Scholar

Download references

Acknowledgements

The research presented in this publication was conducted while the first author was employed by Nuance Communications Deutschland GmbH.

Author information

Authors and Affiliations

Machine Intelligence & Signal Processing Group, MMK, Technische Universität München, Munich, Germany
Raymond Brueckner & Björn Schuller
Department of Computing, Imperial College London, London, UK
Björn Schuller

Authors

Raymond Brueckner
View author publications
You can also search for this author in PubMed Google Scholar
Björn Schuller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raymond Brueckner .

Editor information

Editors and Affiliations

Uninettuno University, Roma, Italy
Francesca D'Errico
Università Roma Tre, Roma, Italy
Isabella Poggi
Department of Computing Science, University of Glasgow, Glasgow, United Kingdom
Alessandro Vinciarelli
Department of Education Sciences, Università di Macerata, Roma, Italy
Laura Vincze

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Brueckner, R., Schuller, B. (2015). Be at Odds? Deep and Hierarchical Neural Networks for Classification and Regression of Conflict in Speech. In: D'Errico, F., Poggi, I., Vinciarelli, A., Vincze, L. (eds) Conflict and Multimodal Communication. Computational Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-14081-0_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-14081-0_19
Published: 14 February 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14080-3
Online ISBN: 978-3-319-14081-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics