Skip to main content
Log in

A soft voting ensemble learning-based approach for multimodal sentiment analysis

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

It is possible to determine people's feelings and opinions about a subject or product from social media posts via sentiment analysis. With the pervasive usage of the Internet and smart devices, the data produced daily by users can be in different modalities such as text, image, audio, and video. Multimodal sentiment analysis is to reveal the feeling of the user's posts by analyzing the data in different modalities as a whole. One of the major challenges of multimodal sentiment is how the sentiment obtained on different modalities is combined to ensure sentiment and meaning integrity of the post. Also, many studies use the same classifying methods to analyze different modalities. In fact, each classifier can be effective in different feature sets. In this study, a soft voting-based ensemble model is proposed that takes advantage of the effective performance of different classifiers on different modalities. In the proposed model, deep feature extraction was made with deep learning methods (BiLSTM, CNN) from the multimodal datasets. After the feature selection was conducted on the features which are a fusion of text and image features, the final feature sets were classified with the soft voting-based ensemble learning model. The performance of the proposed model has been tested on two different benchmark datasets consisting of text–image pairs. As a result of the experimental studies, it was revealed that the proposed model outperformed multiple adversary models on the same datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Li Z, Fan Y, Jiang B, Lei T, Liu W (2019) A survey on sentiment analysis and opinion mining for social multimedia. Multimed Tools Appl 78(6):6939–6967. https://doi.org/10.1007/s11042-018-6445-z

    Article  Google Scholar 

  2. Yang X, Feng S, Wang D, Zhang Y (2020) Image-text multimodal emotion classification via multi-view attentional network. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2020.3035277

    Article  Google Scholar 

  3. Soleymani M, Garcia D, Jou B, Schuller B, Chang SF, Pantic M (2017) A survey of multimodal sentiment analysis. Image Vis Comput 65:3–14. https://doi.org/10.1016/j.imavis.2017.08.003

    Article  Google Scholar 

  4. Xu N, Mao W (2017) A residual merged neutral network for multimodal sentiment analysis. In: 2017 IEEE 2nd ınternational conference on big data analysis, ICBDA 2017, pp 6–10. https://doi.org/10.1109/ICBDA.2017.8078794

  5. Poria S, Majumder N, Hazarika D, Cambria E, Gelbukh A, Hussain A (2018) Multimodal sentiment analysis: addressing key issues and setting up the baselines. IEEE Intell Syst 33(6):17–25. https://doi.org/10.1109/MIS.2018.2882362

    Article  Google Scholar 

  6. Poria S, Chaturvedi I, Cambria E, Hussain A (2017) Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: Proceedings—IEEE ınternational conference on data mining, ICDM, pp 439–448. https://doi.org/10.1109/ICDM.2016.178

  7. Poria S, Cambria E, Howard N, Bin Huang G, Hussain A (2016) Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing 174:50–59. https://doi.org/10.1016/j.neucom.2015.01.095

    Article  Google Scholar 

  8. Niu T, Zhu S, Pang L, El Saddik A (2016) Sentiment analysis on multi-view social data. MultiMed Model:15–27

  9. Huang F, Zhang X, Zhao Z, Xu J, Li Z (2019) Image–text sentiment analysis via deep multimodal attentive fusion. Knowl-Based Syst 167:26–37. https://doi.org/10.1016/j.knosys.2019.01.019

    Article  Google Scholar 

  10. Majumder N, Hazarika D, Gelbukh A, Cambria E, Poria S (2018) Multimodal sentiment analysis using hierarchical fusion with context modeling. Knowl-Based Syst 161:124–133. https://doi.org/10.1016/j.knosys.2018.07.041

    Article  Google Scholar 

  11. Zadeh A, Chen M, Poria S, Cambria E, Morency L-P (2017) Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 1103–1114. https://doi.org/10.18653/v1/D17-1115

  12. Ma H, Wang J, Qian L, Lin H (2020) HAN-ReGRU: hierarchical attention network with residual gated recurrent unit for emotion recognition in conversation. Neural Comput Appl 33(7):2685–2703. https://doi.org/10.1007/s00521-020-05063-7

    Article  Google Scholar 

  13. Poria S, Cambria E, Gelbukh A (2015) Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis, pp 2539–2544. https://doi.org/10.18653/v1/d15-1303

  14. Corchs S, Fersini E, Gasparini F (2019) Ensemble learning on visual and textual data for social image emotion classification. Int J Mach Learn Cybern 10(8):2057–2070. https://doi.org/10.1007/s13042-017-0734-0

    Article  Google Scholar 

  15. Chen F, Gao Y, Cao D, Ji R (2015) Multimodal hypergraph learning for microblog sentiment prediction. In: Proceedings—IEEE ınternational conference on multimedia and expo, 2015, vol 2015. https://doi.org/10.1109/ICME.2015.7177477

  16. Cao D, Ji R, Lin D, Li S (2016) A cross-media public sentiment analysis system for microblog. Multimed Syst 22(4):479–486. https://doi.org/10.1007/s00530-014-0407-8

    Article  Google Scholar 

  17. Xu N (2017) Analyzing multimodal public sentiment based on hierarchical semantic attentional network. In: 2017 IEEE ınternational conference on ıntelligence and security ınformatics: security and big data, ISI 2017, pp 152–154. https://doi.org/10.1109/ISI.2017.8004895

  18. Xu N, Mao W (2017) MultiSentiNet: a deep semantic network for multimodal sentiment analysis. In: International Conference on Information and Knowledge Management, Proceedings, vol Part F1318, pp 2399–2402. https://doi.org/10.1145/3132847.3133142

  19. Xu N, Mao W, Chen G (2018) A co-memory network for multimodal sentiment analysis. In: The 41st ınternational ACM SIGIR conference on research & development in ınformation retrieval—SIGIR ’18, pp 929–932. https://doi.org/10.1145/3209978.3210093

  20. Kumar A, Garg G (2019) Sentiment analysis of multimodal twitter data. Multimed Tools Appl 78(17):24103–24119. https://doi.org/10.1007/s11042-019-7390-1

    Article  Google Scholar 

  21. Xu N, Mao W, Chen G (2019) Multi-interactive memory network for aspect based multimodal sentiment analysis. Proc AAAI Conf Artif Intell 33:371–378. https://doi.org/10.1609/aaai.v33i01.3301371

    Article  Google Scholar 

  22. Chen F, Ji R, Su J, Cao D, Gao Y (2018) Predicting microblog sentiments via weakly supervised multimodal deep learning. IEEE Trans Multimed 20(4):997–1007. https://doi.org/10.1109/TMM.2017.2757769

    Article  Google Scholar 

  23. Huddar MG, Sannakki SS, Rajpurohit VS (2018) An ensemble approach to utterance level multimodal sentiment analysis. In: 2018 ınternational conference on computational techniques, electronics and mechanical systems (CTEMS), pp 145–150. https://doi.org/10.1109/CTEMS.2018.8769162

  24. Tran H-N, Cambria E (2018) Ensemble application of ELM and GPU for real-time multimodal sentiment analysis. Memetic Comput 10(1):3–13. https://doi.org/10.1007/s12293-017-0228-3

    Article  Google Scholar 

  25. Jiang T, Wang J, Liu Z, Ling Y (2020) Fusion-extraction network for multimodal sentiment analysis. In: Lauw HW, Wong RC-W, Ntoulas A, Lim E-P, Ng S-K, Pan SJ (eds) Advances in knowledge discovery and data mining. Springer, Cham, pp 785–797

    Chapter  Google Scholar 

  26. Huddar MG, Sannakki SS, Rajpurohit VS (2021) Attention-based multimodal contextual fusion for sentiment and emotion classification using bidirectional LSTM. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-10285-x

    Article  Google Scholar 

  27. Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: 15th Conf. Eur. Chapter Assoc. Comput. Linguist. EACL 2017 - Proc. Conf., vol 2, pp 427–431. https://doi.org/10.18653/v1/e17-2068

  28. Tan M, Le QV (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: 36th Int. Conf. Mach. Learn. ICML 2019, vol. 2019-June, pp 10691–10700

  29. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database, pp 248–255. https://doi.org/10.1109/cvprw.2009.5206848

  30. Jianqiang Z (2015) Pre-processing boosting twitter sentiment analysis?. In: 2015 IEEE ınternational conference on smart city/socialcom/sustaincom (SmartCity), pp 748–753. https://doi.org/10.1109/SmartCity.2015.158

  31. Yahi N, Belhadef H (2020) Morphosyntactic preprocessing ımpact on document embedding: an empirical study on semantic similarity. Emerg Trends Intell Comput Inform:118–126

  32. Salur MU, Aydın I (2018) The ımpact of preprocessing on classification performance in convolutional neural networks for turkish text. In: 2018 ınternational conference on artificial ıntelligence and data processing (IDAP), pp 1–4. https://doi.org/10.1109/IDAP.2018.8620722

  33. Salur MU, Aydin I (2020) A novel hybrid deep learning model for sentiment classification. IEEE Access 8:58080–58093. https://doi.org/10.1109/ACCESS.2020.2982538

    Article  Google Scholar 

  34. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  35. Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C (2018) A survey on deep transfer learning. In: International conference on artificial neural networks, pp 270–279

  36. Robnık Sıkonja M, Kononenko I (2003) Theoretical and empirical analysis of relieff and Rrelieff. Mach Learn 53:23–69

    Article  Google Scholar 

  37. Huddar MG, Sannakki SS, Rajpurohit VS (2020) Multi-level feature optimization and multimodal contextual fusion for sentiment analysis and emotion classification. Comput Intell 36(2):861–881. https://doi.org/10.1111/coin.12274

    Article  MathSciNet  Google Scholar 

  38. Dong X, Yu Z, Cao W, Shi Y, Ma Q (2020) A survey on ensemble learning. Front Comput Sci 14(2):241–258. https://doi.org/10.1007/s11704-019-8208-z

    Article  Google Scholar 

  39. Hastie T, Rosset S, Zhu J, Zou H (2009) Multi-class adaboost. Stat Interface 2(3):349–360

    Article  MathSciNet  Google Scholar 

  40. Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378. https://doi.org/10.1016/S0167-9473(01)00065-2

    Article  MathSciNet  MATH  Google Scholar 

  41. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. Prepr. https://arxiv.org/abs/1810.04805

  42. Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mehmet Umut Salur.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Salur, M.U., Aydın, İ. A soft voting ensemble learning-based approach for multimodal sentiment analysis. Neural Comput & Applic 34, 18391–18406 (2022). https://doi.org/10.1007/s00521-022-07451-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07451-7

Keywords

Navigation