Speaker-Independent Multimodal Sentiment Analysis for Big Data

Cambria, Erik; Poria, Soujanya; Hussain, Amir

doi:10.1007/978-3-319-97598-6_2

Speaker-Independent Multimodal Sentiment Analysis for Big Data

Erik Cambria⁵,
Soujanya Poria⁵ &
Amir Hussain⁶

Chapter
First Online: 19 July 2019

732 Accesses
2 Citations

Abstract

In this chapter, we propose a contextual multimodal sentiment analysis framework which outperforms the state of the art. This framework has been evaluated against speaker-dependent and speaker-independent problems. We also address the generalizability issue of the proposed method. This chapter also contains a discussion for an important component to be considered for a multimodal information processing system, which is the type of information fusion technique to be applied to combine the multimodal data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
We have reimplemented the method by Poria et al. [23].
2.
RNTN classifies it as neutral. It can be seen here. http://nlp.stanford.edu:8080/sentiment/rntnDemo.html

References

Cambria, E., Das, D., Bandyopadhyay, S., Feraco, A.: A Practical Guide to Sentiment Analysis. Springer, Cham (2017)
Book Google Scholar
Poria, S., Cambria, E., Bajpai, R., Hussain, A.: A review of affective computing: from unimodal analysis to multimodal fusion. Inf. Fusion. 37, 98–125 (2017)
Article Google Scholar
Poria, S., Cambria, E., Hazarika, D., Mazumder, N., Zadeh, A., Morency, L.-P.: Context-dependent sentiment analysis in user-generated videos. ACL. 2, 873–883 (2017)
Google Scholar
Chaturvedi, I., Ragusa, E., Gastaldo, P., Zunino, R., Cambria, E.: Bayesian network based extreme learning machine for subjectivity detection. J. Franklin Inst. 355(4), 1780–1797 (2018)
Article MathSciNet MATH Google Scholar
Cambria, E., Poria, S., Hazarika, D., Kwok, K.: SenticNet 5: discovering conceptual primitives for sentiment analysis by means of context embeddings. In: AAAI, pp. 1795–1802 (2018)
Google Scholar
Oneto, L., Bisio, F., Cambria, E., Anguita, D.: Statistical learning theory & ELM for big social data analysis. IEEE Comput. Intell. Mag. 11(3), 45–55 (2016)
Article Google Scholar
Cambria, E., Hussain, A., Computing, S.: A Common-Sense-Based Framework for Concept-Level Sentiment Analysis. Springer, Cham (2015)
Google Scholar
Cambria, E., Poria, S., Gelbukh, A., Thelwall, M.: Sentiment analysis is a big suitcase. IEEE Intell. Syst. 32(6), 74–80 (2017)
Article Google Scholar
Poria, S., Chaturvedi, I., Cambria, E., Bisio, F.: Sentic LDA: improving on LDA with semantic similarity for aspect-based sentiment analysis. In: IJCNN, pp. 4465–4473 (2016)
Google Scholar
Ma, Y., Cambria, E., Gao, S.: Label embedding for zero-shot fine-grained named entity typing. In: COLING, pp. 171–180 (2016)
Google Scholar
Xia, Y., Erik, C., Hussain, A., Zhao, H.: Word polarity disambiguation using bayesian model & opinion-level features. Cogn. Comput. 7(3), 369–380 (2015)
Article Google Scholar
Zhong, X., Sun, A., Cambria, E.: Time expression analysis and recognition using syntactic token types and general heuristic rules. In: ACL, pp. 420–429 (2017)
Google Scholar
Majumder, N., Poria, S., Gelbukh, A., Cambria, E.: Deep learning-based document modeling for personality detection from text. IEEE Intell. Syst. 32(2), 74–79 (2017)
Article Google Scholar
Poria, S., Cambria, E., Hazarika, D., Vij, P.: A deeper look into sarcastic tweets using deep convolutional neural networks. In: COLING, pp. 1601–1612 (2016)
Google Scholar
Xing, F., Cambria, E., Welsch, R.: Natural language based financial forecasting: a survey. Artif. Intell. Rev. 50(1), 49–73 (2018)
Article Google Scholar
Ebrahimi, M., Hossein, A., Sheth, A.: Challenges of sentiment analysis for dynamic events. IEEE Intell. Syst. 32(5), 70–75 (2017)
Article Google Scholar
Cambria, E., Hussain, A., Durrani, T., Havasi, C., Eckl, C., Munro, J.: Sentic computing for patient centered application. In: IEEE ICSP, pp. 1279–1282 (2010)
Google Scholar
Valdivia, A., Luzon, V., Herrera, F.: Sentiment analysis in tripadvisor. IEEE Intell. Syst. 32(4), 72–77 (2017)
Article Google Scholar
Cavallari, S., Zheng, V., Cai, H., Chang, K., Cambria, E.: Learning community embedding with community detection and node embedding on graphs. In: CIKM, pp. 377–386 (2017)
Google Scholar
Mihalcea, R., Garimella, A.: What men say, what women hear: finding gender-specific meaning shades. IEEE Intell. Syst. 31(4), 62–67 (2016)
Article Google Scholar
Pérez-Rosas, V., Mihalcea, R., Morency, L.-P.: Utterancelevel multimodal sentiment analysis. ACL. 1, 973–982 (2013)
Google Scholar
Wollmer, M., Weninger, F., Knaup, T., Schuller, B., Sun, C., Sagae, K., Morency, L.-P.: Youtube movie reviews: Sentiment analysis in an audio-visual context. IEEE Intell. Syst. 28(3), 46–53 (2013)
Article Google Scholar
Poria, S., Cambria, E., Gelbukh, A.: Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: Proceedings of EMNLP, pp. 2539–2544 (2015)
Google Scholar
Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)
Article Google Scholar
D’mello, S.K., Kory, J.: A review and meta-analysis of multimodal affect detection systems. ACM Comput. Surv. 47(3), 43–79 (2015)
Google Scholar
Rosas, V., Mihalcea, R., Morency, L.-P.: Multimodal sentiment analysis of spanish online videos. IEEE Intell. Syst. 28(3), 38–45 (2013)
Article Google Scholar
Sarkar, C., Bhatia, S., Agarwal, A., Li, J.: Feature analysis for computational personality recognition using youtube personality data set. In: Proceedings of the 2014 ACM Multi Media on Workshop on Computational Personality Recognition, pp. 11–14. ACM (2014)
Google Scholar
Poria, S., Cambria, E., Hussain, A., Huang, G.-B.: Towards an intelligent framework for multimodal affective data analysis. Neural Netw. 63, 104–116 (2015)
Article Google Scholar
Monkaresi, H., Sazzad Hussain, M., Calvo, R.A.: Classification of affects using head movement, skin color features and physiological signals. In: Systems, Man, and Cybernetics (SMC), 2012 I.E. International Conference on IEEE, pp. 2664–2669 (2012)
Google Scholar
Wang, S., Zhu, Y., Wu, G., Ji, Q.: Hybrid video emotional tagging using users’ eeg & video content. Multimed. Tools Appl. 72(2), 1257–1283 (2014)
Article Google Scholar
Alam, F., Riccardi, G.: Predicting personality traits using multimodal information. In: Proceedings of the 2014 ACM Multi Media on Workshop on Computational Personality Recognition, pp. 15–18. ACM (2014)
Google Scholar
Cai, G., Xia, B.: Convolutional neural networks for multimedia sentiment analysis. In: National CCF Conference on Natural Language Processing and Chinese Computing, pp. 159–167. Springer (2015)
Google Scholar
Yamasaki, T., Fukushima, Y., Furuta, R., Sun, L., Aizawa, K., Bollegala, D.: Prediction of user ratings of oral presentations using label relations. In: Proceedings of the 1st International Workshop on Affect & Sentiment in Multimedia, pp. 33–38. ACM (2015)
Google Scholar
Glodek, M., Reuter, S., Schels, M., Dietmayer, K., Schwenker, F.: Kalman filter based classifier fusion for affective state recognition. In: Multiple Classifier Systems, pp. 85–94. Springer (2013)
Google Scholar
Dobrišek, S., Gajšek, R., Mihelič, F., Pavešić, N., Štruc, V.: Towards efficient multi-modal emotion recognition. Int. J. Adv. Rob. Syst. 10, 53 (2013)
Article Google Scholar
Mansoorizadeh, M., Charkari, N.M.: Multimodal information fusion application to human emotion recognition from face and speech. Multimed. Tools Appl. 49(2), 277–297 (2010)
Article Google Scholar
Poria, S., Cambria, E., Howard, N., Huang, G.-B., Hussain, A.: Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing. 174, 50–59 (2016)
Article Google Scholar
Lin, J.-C., Wu, C.-H., Wei, W.-L.: Error weighted semi-coupled hidden markov model for audio-visual emotion recognition. IEEE Trans. Multimed. 14(1), 142–156 (2012)
Article Google Scholar
Lu, K., Jia, Y.: Audio-visual emotion recognition with boosted coupled hmm. In: 21st International Conference on Pattern Recognition (ICPR), IEEE 2012, pp. 1148–1151 (2012)
Google Scholar
Metallinou, A., Wöllmer, M., Katsamanis, A., Eyben, F., Schuller, B., Narayanan, S.: Context-sensitive learning for enhanced audiovisual emotion classification. IEEE Trans. Affect. Comput. 3(2), 184–198 (2012)
Article Google Scholar
Baltrusaitis, T., Banda, N., Robinson, P.: Dimensional affect recognition using continuous conditional random fields. In: Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on IEEE, pp. 1–8 (2013)
Google Scholar
Wöllmer, M., Kaiser, M., Eyben, F., Schuller, B., Rigoll, G.: Lstm-modeling of continuous emotions in an audiovisual affect recognition framework. Image Vis. Comput. 31(2), 153–163 (2013)
Article Google Scholar
Song, M., Jiajun, B., Chen, C., Li, N.: Audio-visual based emotion recognition-a new approach. Comput. Vis. Pattern Recognit. 2, II–1020 (2004)
Google Scholar
Zeng, Z., Hu, Y., Liu, M., Fu, Y., Huang, T.S.: Training combination strategy of multi-stream fused hidden markov model for audio-visual affect recognition. In: Proceedings of the 14th Annual ACM International Conference on Multimedia, pp. 65–68. ACM (2006)
Google Scholar
Caridakis, G., Malatesta, L., Kessous, L., Amir, N., Raouzaiou, A., Karpouzis, K.: Modeling naturalistic affective states via facial & vocal expressions recognition. In: Proceedings of the 8th International Conference on Multimodal Interfaces, pp. 146–154. ACM (2006)
Google Scholar
Petridis, S., Pantic, M.: Audiovisual discrimination between laughter and speech. In: International Conference on Acoustics, Speech and Signal Processing, ICASSP 2008. IEEE, pp. 5117–5120 (2008)
Google Scholar
Sebe, N., Cohen, I., Gevers, T., Huang, T.S.: Emotion recognition based on joint visual and audio cues. In: 18th International Conference on Pattern Recognition, ICPR 2006, IEEE, vol. 1, pp. 1136–1139 (2006)
Google Scholar
Atrey, P.K., Anwar Hossain, M., Saddik, A.E., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimed. Syst. 16(6), 345–379 (2010)
Article Google Scholar
Corradini, A., Mehta, M., Bernsen, N.O., Martin, J., Abrilian, S.: Multimodal input fusion in human-computer interaction. Comput. Syst. Sci. 198, 223 (2005)
Google Scholar
Iyengar, G., Nock, H.J., Neti, C.: Audio-visual synchrony for detection of monologues in video archives. In: Proceedings of International Conference on Multimedia and Expo, ICME’03, IEEE, vol. 1, pp. 772–775 (2003)
Google Scholar
Adams, W.H., Iyengar, G., Lin, C.-Y., Naphade, M.R., Neti, C., Nock, H.J., Smith, J.R.: Semantic indexing of multimedia content using visual, audio & text cues. EURASIP J. Adv. Signal Process. 2003(2), 1–16 (2003)
Article Google Scholar
Nefian, A.V., Liang, L., Pi, X., Liu, X., Murphy, K.: Dynamic Bayesian networks for audio-visual speech recognition. EURASIP J. Adv. Signal Process. 2002(11), 1–15 (2002)
Article MATH Google Scholar
Nickel, K., Gehrig, T., Stiefelhagen, R., McDonough, J.: A joint particle filter for audio-visual speaker tracking. In: Proceedings of the 7th International Conference on Multimodal Interfaces, pp. 61–68. ACM (2005)
Google Scholar
Potamitis, I., Chen, H., Tremoulis, G.: Tracking of multiple moving speakers with multiple microphone arrays. IEEE Trans. Speech Audio Process. 12(5), 520–529 (2004)
Article Google Scholar
Morency, L.-P., Mihalcea, R., Doshi, P.: Towards multimodal sentiment analysis: harvesting opinions from the web. In: Proceedings of the 13th International Conference on Multimodal Interfaces, pp. 169–176. ACM (2011)
Google Scholar
Gunes, H., Pantic, M.: Dimensional emotion prediction from spontaneous head gestures for interaction with sensitive artificial listeners. In: International Conference on Intelligent Virtual Agents, pp. 371–377 (2010)
Chapter Google Scholar
Valstar, M.F., Almaev, T., Girard, J.M., McKeown, G., Mehu, M., Yin, L., Pantic, M., Cohn, J.F.: Fera 2015-second facial expression recognition and analysis challenge. Automat. Face Gesture Recognit. 6, 1–8 (2015)
Google Scholar
Nicolaou, M.A., Gunes, H., Pantic, M.: Automatic segmentation of spontaneous data using dimensional labels from multiple coders. In: Proceedings of LREC Int’l Workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, pp. 43–48 (2010)
Google Scholar
Chang, K.-H., Fisher, D., Canny, J.: Ammon: a speech analysis library for analyzing affect, stress & mental health on mobile phones. In: Proceedings of PhoneSense (2011)
Google Scholar
Castellano, G., Kessous, L., Caridakis, G.: Emotion recognition through multiple modalities: face, body gesture, speech. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction, pp. 92–103. Springer, Heidelberg (2008)
Google Scholar
Eyben, F., Wöllmer, M., Graves, A., Schuller, B., Douglas-Cowie, E., Cowie, R.: On-line emotion recognition in a 3-d activation-valence-time continuum using acoustic and linguistic cues. J. Multimodal User Interfaces. 3(1–2), 7–19 (2010)
Article Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: Openear—introducing the Munich open-source emotion and affect recognition toolkit. In: 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops 2009, pp. 1–6. IEEE (2009)
Google Scholar
Chetty, G., Wagner, M., Goecke, R.: A multilevel fusion approach for audiovisual emotion recognition. In: AVSP, pp. 115–120 (2008)
Google Scholar
Zhang, S., Li, L., Zhao, Z.: Audio-visual emotion recognition based on facial expression and affective speech. In: Multimedia and Signal Processing, pp. 46–52. Springer (2012)
Google Scholar
Paleari, M., Benmokhtar, R., Huet, B.: Evidence theory-based multimodal emotion recognition. In: International Conference on Multimedia Modeling, pp. 435–446 (2009)
Google Scholar
Rahman, T., Busso, C.: A personalized emotion recognition system using an unsupervised feature adaptation scheme. In: 2012 I.E. International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5117–5120. IEEE (2012)
Google Scholar
Jin, Q., Li, C., Chen, S., Wu, H.: Speech emotion recognition with acoustic and lexical features. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015, pp. 4749–4753. IEEE (2015)
Google Scholar
Metallinou, A., Lee, S., Narayanan, S.: Audio-visual emotion recognition using Gaussian mixture models for face and voice. In: 10th IEEE International Symposium on ISM 2008, pp. 250–257. IEEE (2008)
Google Scholar
Rozgić, V., Ananthakrishnan, S., Saleem, S., Kumar, R., Prasad, R.: Ensemble of svm trees for multimodal emotion recognition. In: Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1–4. IEEE (2012)
Google Scholar
DeVault, D., Artstein, R., Benn, G., Dey, T., Fast, E., Gainer, A., Georgila, K., Gratch, J., Hartholt, A., Lhommet, M., et al.: Simsensei kiosk: a virtual human interviewer for healthcare decision support. In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems, pp. 1061–1068 (2014)
Google Scholar
Siddiquie, B., Chisholm, D., Divakaran, A.: Exploiting multimodal affect and semantics to identify politically persuasive web videos. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. ACM, pp. 203–210 (2015)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM International Conference on Multimedia. ACM, pp. 1459–1462 (2010)
Google Scholar
Baltrušaitis, T., Robinson, P., Morency, L.-P.: 3d constrained local model for rigid and non-rigid facial tracking. In: Computer Vision and Pattern Recognition (CVPR), pp. 2610–2617. IEEE (2012).
Google Scholar
Gers, F.: Long Short-Term Memory in Recurrent Neural Networks, Ph.D. thesis, Universität Hannover (2001)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., Xu, B.: Attention-based bidirectional long short-term memory networks for relation classification. In: The 54th Annual Meeting of the Association for Computational Linguistics, pp. 207–213 (2016)
Google Scholar
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning & stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
MathSciNet MATH Google Scholar
Zadeh, A., Zellers, R., Pincus, E., Morency, L.-P.: Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages. IEEE Intell. Syst. 31(6), 82–88 (2016)
Article Google Scholar
Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J.N., Lee, S., Narayanan, S.S.: Iemocap: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335–359 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, NTU, Singapore, Singapore
Erik Cambria & Soujanya Poria
School of Natural Sciences, University of Stirling, Stirling, UK
Amir Hussain

Authors

Erik Cambria
View author publications
You can also search for this author in PubMed Google Scholar
Soujanya Poria
View author publications
You can also search for this author in PubMed Google Scholar
Amir Hussain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erik Cambria .

Editor information

Editors and Affiliations

School of Engineering and Information Technology, University of New South Wales, Canberra, ACT, Australia
Kah Phooi Seng
School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
Li-minn Ang
School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
Alan Wee-Chung Liew
The University of Sydney Business School, University of Sydney, Sydney, NSW, Australia
Junbin Gao

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cambria, E., Poria, S., Hussain, A. (2019). Speaker-Independent Multimodal Sentiment Analysis for Big Data. In: Seng, K., Ang, Lm., Liew, AC., Gao, J. (eds) Multimodal Analytics for Next-Generation Big Data Technologies and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-97598-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-97598-6_2
Published: 19 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97597-9
Online ISBN: 978-3-319-97598-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics