Skip to main content
Log in

Image-text interaction graph neural network for image-text sentiment analysis

  • Published:
Applied Intelligence Aims and scope Submit manuscript


As various social platforms are experiencing fast development, the volume of image-text content generated by users has grown rapidly. Image-text based sentiment of social media analysis has also attracted great interest from researchers in recent years. The main challenge of image-text sentiment analysis is how to construct a model that can promote the complementarity between image and text. In most previous studies, images and text were simply merged, while the interaction between them was not fully considered. This paper proposes an image-text interaction graph neural network for image-text sentiment analysis. A text-level graph neural network is used to extract the text features, and a pre-trained convolutional neural network is employed to extract the image features. Then, an image-text interaction graph network is constructed. The node features of the graph network are initialized by the text features and the image features, while the node features in the graph are updated based on the graph attention mechanism. Finally, combined with image-text aggregation layer to realize sentiment classification. The results of the experiments prove that the presented method is more effective than existing methods. In addition, a large-scale Twitter image-text sentiment analysis dataset was built by us and used in the experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others





  1. Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemometr Intell Lab Syst 2(1–3):37–52

    Article  Google Scholar 

  2. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Bengio Y (2014) Generative adversarial nets. Advances in neural information processing systems, 27

  3. Xu N, Mao W (2017) Multisentinet: A deep semantic network for multimodal sentiment analysis. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp 2399–2402

  4. Tang D, Qin B, Liu T (2015) Learning semantic representations of users and products for document level sentiment classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 1014–1023

  5. Ibrahim M, Abdillah O, Wicaksono AF, Adriani M (2015) Buzzer detection and sentiment analysis for predicting presidential election results in a twitter nation. In: 2015 IEEE international conference on data mining workshop (ICDMW). IEEE, pp 1348–1353

  6. Sun M, Yang J, Wang K, Shen H (2016) Discovering affective regions in deep convolutional neural networks for visual sentiment prediction. In: 2016 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1–6

  7. Song C, Wang X-K, Cheng Px-f, Wang J-q, Li L (2020) Sacpc: A framework based on probabilistic linguistic terms for short text sentiment analysis. Knowl-Based Syst:105572

  8. Poria S, Chaturvedi I, Cambria E, Hussain A (2016) Convolutional mkl based multimodal emotion recognition and sentiment analysis. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 439–448

  9. Yu Y, Lin H, Meng J, Zhao Z (2016) Visual and textual sentiment analysis of a microblog using deep convolutional neural networks. Algorithms 9(2):41

    Article  MathSciNet  Google Scholar 

  10. Xu N, Mao W, Chen G (2018) A co-memory network for multimodal sentiment analysis. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp 929–932

  11. Huang L, Ma D, Li S, Zhang X, Houfeng W (2019) Text level graph neural network for text classification. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 3435–3441

  12. Raffel C, Ellis DPW (2015) Feed-forward networks with attention can solve some long-term memory problems. arXiv:1512.08756

  13. Singla Z, Randhawa S, Jain S (2017) Sentiment analysis of customer product reviews using machine learning. In: 2017 International Conference on Intelligent Computing and Control (I2C2). IEEE, pp 1–5

  14. Naseem U, Razzak I, Musial K, Imran M (2020) Transformer based deep intelligent contextual embedding for twitter sentiment analysis. Fut Gener Comput Syst 113:58–69

    Article  Google Scholar 

  15. Nurifan F, Sarno R, Sungkono KR (2019) Aspect based sentiment analysis for restaurant reviews using hybrid elmo-wikipedia and hybrid expanded opinion lexicon-senticircle. Int J Intell Eng Syst 12(6):47–58

    Google Scholar 

  16. Esuli A, Sebastiani F (2006) Sentiwordnet: A publicly available lexical resource for opinion mining. In: LREC, vol 6. Citeseer, pp 417–422

  17. Goel A, Gautam J, Kumar S (2016) Real time sentiment analysis of tweets using naive bayes. In: 2016 2nd International Conference on Next Generation Computing Technologies (NGCT). IEEE, pp 257–261

  18. Joachims T (1998) Text categorization with support vector machines: Learning with many relevant features. In: European conference on machine learning. Springer, pp 137–142

  19. Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) Liblinear: A library for large linear classification. J Mach Learn Res 9:1871–1874

    MATH  Google Scholar 

  20. Yan X, Huang T (2015) Tibetan sentence sentiment analysis based on the maximum entropy model. In: 2015 10th International Conference on Broadband and Wireless Computing, Communication and Applications (BWCCA). IEEE, pp 594–597

  21. Riaz S, Fatima M, Kamran M, Nisar MW (2019) Opinion mining on large scale data using sentiment analysis and k-means clustering. Cluster Comput 22(3):7149–7164

    Article  Google Scholar 

  22. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  23. Zhou X, Wan X, Xiao J (2016) Attention-based lstm network for cross-lingual sentiment classification. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 247–256

  24. Yang T, Li Y, Pan Q, Guo L (2016) Tb-cnn: joint tree-bank information for sentiment analysis using cnn. In: 2016 35th Chinese Control Conference (CCC). IEEE, pp 7042–7044

  25. Liao W, Zeng B, Yin X, Wei P (2021) An improved aspect-category sentiment analysis model for text sentiment analysis based on roberta. Appl Intell 51(6):3522–3533

    Article  Google Scholar 

  26. Sun C, Huang L, Qiu X (2019) Utilizing bert for aspect-based sentiment analysis via constructing auxiliary sentence. In: Proceedings of NAACL-HLT, pp 380–385

  27. Devlin J, Kenton M-WC, Toutanova LK (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp 4171–4186

  28. Lu X, Suryanarayan P, Adams Jr RB, Li J, Newman MG, Wang JZ (2012) On shape and the computability of emotions. In: Proceedings of the 20th ACM international conference on Multimedia, pp 229–238

  29. Zhao S, Gao Y, Jiang X, Yao Hx, Chua T-S, Sun X (2014) Exploring principles-of-art features for image emotion recognition. In: Proceedings of the 22nd ACM international conference on Multimedia, pp 47–56

  30. Yuan J, Mcdonough S, You Q, Luo J (2013) Sentribute: image sentiment analysis from a mid-level perspective. In: Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining, pp 1–8

  31. Xu C, Cetintas S, Lee KC, Li LJ (2014) Visual sentiment prediction with deep convolutional neural networks. arXiv:1411.5731

  32. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  33. He X, Zhang H, Li N, Feng L, Zheng F (2019) A multi-attentive pyramidal model for visual sentiment analysis. In: 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, pp 1–8

  34. Huang F, Zhang X, Zhao Z, Xu J, Li Z (2019) Image-text sentiment analysis via deep multimodal attentive fusion. Knowl-Based Syst 167:26–37

    Article  Google Scholar 

  35. Xu J, Huang F, Zhang X, Wang S, Li C, Li Z, He Y (2019) Visual-textual sentiment classification with bi-directional multi-level attention networks. Knowl-Based Syst 178:61–73

    Article  Google Scholar 

  36. Huang F, Wei K, Weng J, Li Z (2020) Attention-based modality-gated networks for image-text sentiment analysis. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16(3):1–19

    Article  Google Scholar 

  37. Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv:1710.10903

  38. Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  39. Yao L, Mao C, Luo Y (2019) Graph convolutional networks for text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 7370–7377

  40. Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp 6105–6114

  41. Woo S, Park J, Lee J-Y, So Kweon I (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  42. Guo D, Shao Y, Cui Y, Wang Z, Zhang L, Shen C (2021) Graph attention tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9543–9552

  43. Jiang T, Wang J, Liu Z, Ling Y (2020) Fusion-extraction network for multimodal sentiment analysis. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp 785–797

  44. Niu T, Zhu S, Pang L, El Saddik A (2016) Sentiment analysis on multi-view social data. In: International Conference on Multimedia Modeling. Springer, pp 15–27

  45. Hu Y, Zheng L, Yang Y, Huang Y (2017) Twitter100k: A real-world dataset for weakly supervised cross-media retrieval. IEEE Trans Multimed 20(4):927–938

    Article  Google Scholar 

  46. Hutto C, Gilbert E (2014) Vader: A parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the International AAAI Conference on Web and Social Media, vol 8

  47. Vadicamo L, Carrara F, Cimino A, Cresci S, Dell’Orletta F, Falchi F, Tesconi M (2017) Cross-media learning for image sentiment analysis in the wild. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp 308–317

  48. Cai G, Xia B (2015) Convolutional neural networks for multimedia sentiment analysis. In: Natural Language Processing and Chinese Computing. Springer, pp 159–167

  49. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. Adv Neural Inf Process Syst 27:487–495

    Google Scholar 

  50. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  51. Cui Y, Chen Z, Wei S, Wang S, Liu T, Hu G (2017) Attention-over-attention neural networks for reading comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 593–602

  52. Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1715–1725

  53. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980

Download references


This work was partially supported by the Natural Science Foundations of Guangdong Province, China (2019A1515011056, 2018A030310540), the National Natural Science Foundation of China (61701122), the Key Technology Projects in HighTech Industrial Field of Qingyuan (No. 2020KJJH039), and the Major Science and Technology Projects of Zhongshan, China (191021082628279).

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Bi Zeng or Jianqi Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liao, W., Zeng, B., Liu, J. et al. Image-text interaction graph neural network for image-text sentiment analysis. Appl Intell 52, 11184–11198 (2022).

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: