Low-rank tensor fusion and self-supervised multi-task multimodal sentiment analysis

Miao, Xinmeng; Zhang, Xuguang; Zhang, Haoran

doi:10.1007/s11042-023-18032-8

Low-rank tensor fusion and self-supervised multi-task multimodal sentiment analysis

Published: 11 January 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

189 Accesses
Explore all metrics

Abstract

Multimodal sentiment analysis plays an important role in the field of smart education. To achieve high performance in Multimodal Sentiment Analysis (MSA) tasks, the model must effectively capture the information conveyed by individual modal representations. The primary objective is to learn the complementarity and correlation of the various modalities, however, existing methods often fall short in either capturing complementary information or relevant information. Therefore, it is crucial to address these challenges to improve the performance of MSA models. To address this problem, this paper proposes a multitask multimodal sentiment analysis framework based on low-rank tensor fusion and self-supervision. In this model, the combination of low-rank tensor fusion and Mish function is used to capture inter-modal correlation information, the combination of unimodal label generation module and Mish activation function is introduced to capture inter-modal complementary information. And introduce the principle of multi-task learning to combine the above two tasks, thus enhancing the ability to capture information. Furthermore, we conducted comprehensive experiments on two widely-used Multimodal Sentiment Analysis datasets, namely CMU-MOSI and CMU-MOSEI, to evaluate the performance of our proposed model. The experimental results demonstrate the effectiveness of our model in achieving advanced performance in MSA tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transformer-based adaptive contrastive learning for multimodal sentiment analysis

Article 12 April 2024

Balanced sentimental information via multimodal interaction model

Article 13 January 2024

Multimodal Social Media Sentiment Analysis Based on Cross-Modal Hierarchical Attention Fusion

Data availability

The datasets during the current study are available in the publicly datasets: CMU-MOSI: http://multicomp.cs.cmu.edu/resources/cmu-mosi-dataset/, CMU-MOSEI: http://multicomp.cs.cmu.edu/res-ources/cmu-mosei-dataset/

References

Tawunrat C, Jeremy E (2015) Chapter information science and applications, simple approaches of sentiment analysis via ensemble learning, vol 339 of the series lecture notes in electrical engineering, DISCIPLINES Computer Science, Engineering SUBDISCIPLINESAI. Information Systems and Applications- Computational Intelligence and Complexity
Baltrušaitis T, Ahuja C, Morency L-P (2018) Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 41(2):423–443
Article Google Scholar
Gandhi A, Adhvaryu K, Poria S, Cambria E, Hussain A (2022) Multimodal sentiment analysis: a systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions. Information Fusion 91:424–444
Article Google Scholar
Cai Y, Yang K, Huang D, Zhou Z, Lei X, Xie H, Wong TL (2019) A hybrid model for opinion mining based on domain sentiment dictionary. International Journal of Machine Learning and Cybernetics 10:2131–2142
Article Google Scholar
Roccetti M, Marfia G, Salomoni P, Prandi C, Zagari RM, Kengni FLG, Bazzoli F, Montagnani M (2017) Attitudes of Crohn’s disease patients: infodemiology case study and sentiment analysis of Facebook and Twitter posts. JMIR Public Health Surveill 3(3):7004
Article Google Scholar
Rao Y, Lei J, Wenyin L, Li Q, Chen M (2014) Building emotional dictionary for sentiment analysis of online news. World Wide Web 17:723–742
Article Google Scholar
Kamal A, Abulaish M (2013) Statistical features identification for sentiment analysis using machine learning techniques. In: 2013 International symposium on computational and business intelligence, pp 178–181 IEEE
Vijayaraghavan S, Basu D (2020) Sentiment analysis in drug reviews using supervised machine learning algorithms. arXiv:2003.11643
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Article Google Scholar
Xing Y, Xiao C, Wu Y, Ding Z (2019) A convolutional neural network for aspect-level sentiment classification. Int J Pattern Recogn Artif Intell 33(14):1959046
Article Google Scholar
Li R, Wu Z, Jia J, Bu Y, Zhao S, Meng H (2019) Towards discriminative representation learning for speech emotion recognition. In: IJCAI, pp 5060–5066
Savargiv M, Bastanfard A (2013) Text material design for fuzzy emotional speech corpus based on persian semantic and structure. In: 2013 International conference on fuzzy theory and its applications (iFUZZY), pp 380–384. IEEE
Gandhi A, Adhvaryu K, Khanduja V (2021) Multimodal sentiment analysis: review, application domains and future directions. In: 2021 IEEE Pune section international conference (PuneCon), pp 1–5. IEEE
Demotte P, Wijegunarathna K, Meedeniya D, Perera I (2021) Enhanced sentiment extraction architecture for social media content analysis using capsule networks. Multimedia Tools Appl 1–26
Poria S, Hazarika D, Majumder N, Mihalcea R (2020) Beneath the tip of the iceberg: current challenges and new directions in sentiment analysis research. IEEE Trans Affect Comput
Tembhurne JV, Diwan T (2021) Sentiment analysis in textual, visual and multimodal inputs using recurrent neural networks. Multimedia Tools Appl 80:6871–6910
Article Google Scholar
Guo W, Wang J, Wang S (2019) Deep multimodal representation learning: a survey. IEEE Access 7:63373–63394
Article Google Scholar
Cao R, Ye C, Zhou H (2021) Multimodel sentiment analysis with self-attention. In: Proceedings of the future technologies conference (FTC) 2020, vol 1, pp 16–26. Springer
Zadeh A, Chen M, Poria S, Cambria E, Morency L–P (2017) Tensor fusion network for multimodal sentiment analysis. arXiv:1707.07250
Yu W, Xu H, Meng F, Zhu Y, Ma Y, Wu J, Zou J, Yang K (2020) Ch-sims: a Chinese multimodal sentiment analysis dataset with fine–grained annotation of modality. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 3718–3727
Yu W, Xu H, Yuan Z, Wu J (2021) Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. Proceedings of the AAAI Conference on Artificial Intelligence 35:10790–10797
Article Google Scholar
Misra D (2019) Mish: a self regularized non-monotonic activation function. arXiv:1908.08681
Misra D (2019) Mish: a self regularized non-monotonic activation function. arXiv:1908.08681
Morency LP, Mihalcea R, Doshi P (2011) Towards multimodal sentiment analysis: harvesting opinions from the web. In: Proceedings of the 13th International conference on multimodal interfaces, pp 169–176
Xiao J, Luo X (2022) A survey of sentiment analysis based on multi-modal information. In: 2022 IEEE Asia–Pacific conference on image processing, electronics and computers (IPEC), pp 712–715. IEEE
Zhou S, Jia J, Wang Q, Dong Y, Yin Y, Lei K (2018) Inferring emotion from conversational voice data: a semi–supervised multi-path generative neural network approach. In: Proceedings of the AAAI conference on Artificial Intelligence, vol 32
Zhang K, Zhu Y, Zhang W, Zhang W, Zhu Y (2020) Transfer correlation between textual content to images for sentiment analysis. IEEE Access 8:35276–35289
Article Google Scholar
Dobrišek S, Gajšek R, Mihelič F, Pavešić N, Štruc V (2013) Towards efficient multi-modal emotion recognition. Int J Adv Robot Syst 10(1):53
Article Google Scholar
Poria S, Cambria E, Gelbukh A (2015) Deep convolutional neural network textual features and multiple kernel learning for utterance–level multimodal sentiment analysis. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 2539–2544
Ji R, Chen F, Cao L, Gao Y (2018) Cross-modality microblog sentiment prediction via bi-layer multimodal hypergraph learning. IEEE Transactions on Multimedia 21(4):1062–1075
Article Google Scholar
Akhtar MS, Chauhan DS, Ghosal D, Poria S, Ekbal A, Bhattacharyya P (2019) Multi-task learning for multi-modal emotion recognition and sentiment analysis. arXiv:1905.05812
Tsai Y–HH, Ma MQ, Yang M, Salakhutdinov R, Morency L–P (2020) Multimodal routing: improving local and global interpretability of multimodal language analysis. In: Proceedings of the conference on empirical methods in natural language processing. conference on empirical methods in natural language processing, vol 2020, pp 1823. NIH Public Access
Matthew E (2018) Peters, mark neumann, mohit iyyer, matt gardner, christopher clark, kenton lee, luke zettlemoyer. deep contextualized word representations. In: Proc. of NAACL, vol 5
Radford A, Narasimhan K, Salimans T, Sutskever I et al (2018) Improving language understanding by generative pre–training
Devlin J, Chang M–W, Lee K, Toutanova K (2018) Bert: pre–training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Gupta B, Prakasam P, Velmurugan T (2022) Integrated bert embeddings, bilstmbigru and 1-d cnn model for binary sentiment classification analysis of movie reviews. Multimedia Tools Appl 81(23):33067–33086
Article Google Scholar
Gao S, Chen X, Ren Z, Zhao D, Yan R (2020) From standard summarization to new tasks and beyond: Summarization with manifold information. arXiv:2005.04684
Hazarika D, Zimmermann R, Poria S (2020) Misa: Modality–invariant and–specific representations for multimodal sentiment analysis. In: Proceedings of the 28th ACM international conference on multimedia, pp 1122–1131
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoderdecoder for statistical machine translation. arXiv:1406.1078
Lei T, Zhang Y,Wang SI, Dai H, Artzi Y (2017) Simple recurrent units for highly parallelizable recurrence. arXiv:1709.02755
Zadeh A, Zellers R, Pincus E, Morency L–P (2016) Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. arXiv:1606.06259
Zadeh AB, Liang PP, Poria S, Cambria E, Morency L–P (2018) Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th annual meeting of the association for computational linguistics (vol 1: Long Papers), pp 2236–2246
Tsai Y–HH, Liang PP, Zadeh A, Morency L–P, Salakhutdinov R (2018) Learning factorized multimodal representations. arXiv:1806.06176
Tsai Y–HH, Bai S, Liang PP, Kolter JZ, Morency L–P, Salakhutdinov R (2019) Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the conference. Association for Computational Linguistics. Meeting, vol. 2019, p. 6558. NIH Public Access
Rahman W, Hasan MK, Lee S, Zadeh A, Mao C, Morency L–P, Hoque E (2020) Integrating multimodal information in large pretrained transformers. In: Proceedings of the conference. Association for Computational Linguistics. Meeting, vol 2020, pp 2359. NIH Public Access
Han W, Chen H, Poria S (2021) Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis. arXiv:2109.00412
Wang D, Guo X, Tian Y, Liu J, He L, Luo X (2023) Tetfn: a text enhanced transformer fusion network for multimodal sentiment analysis. Pattern Recognit 136:109259
Article Google Scholar

Download references

Funding

Not applicable

Author information

Authors and Affiliations

School of Communication Engineering, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China
Xinmeng Miao & Xuguang Zhang
School of Computer Science and Information Technology, Northeast Normal University, Changchun, 130024, Jilin, China
Haoran Zhang

Authors

Xinmeng Miao
View author publications
You can also search for this author in PubMed Google Scholar
Xuguang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Haoran Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuguang Zhang.

Ethics declarations

Ethics approval

Not applicable

Conflict of interest

The authors declare that they have no confict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Miao, X., Zhang, X. & Zhang, H. Low-rank tensor fusion and self-supervised multi-task multimodal sentiment analysis. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-023-18032-8

Download citation

Received: 27 April 2023
Revised: 05 September 2023
Accepted: 26 December 2023
Published: 11 January 2024
DOI: https://doi.org/10.1007/s11042-023-18032-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Low-rank tensor fusion and self-supervised multi-task multimodal sentiment analysis

Abstract

Access this article

Similar content being viewed by others

Transformer-based adaptive contrastive learning for multimodal sentiment analysis

Balanced sentimental information via multimodal interaction model

Multimodal Social Media Sentiment Analysis Based on Cross-Modal Hierarchical Attention Fusion

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics approval

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Low-rank tensor fusion and self-supervised multi-task multimodal sentiment analysis

Abstract

Access this article

Similar content being viewed by others

Transformer-based adaptive contrastive learning for multimodal sentiment analysis

Balanced sentimental information via multimodal interaction model

Multimodal Social Media Sentiment Analysis Based on Cross-Modal Hierarchical Attention Fusion

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics approval

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation