Skip to main content

ModalNet: an aspect-level sentiment classification model by exploring multimodal data with fusion discriminant attentional network


Aspect-level sentiment classification aims to identify sentiment polarity over each aspect of a sentence. In the past, such analysis tasks mainly relied on text data. Nowadays, due to the popularization of smart devices and Internet services, people are generating more abundant data, including text, image, video, et al. Multimodal data from the same post (e.g., a tweet) usually has certain correlation. For example, image data might has an auxiliary effect on the text data, and reasonable processing of such multimodal data can help obtain much richer information for sentiment analysis. To this end, we propose an aspect-level sentiment classification model by exploring multimodal data with fusion discriminant attentional network. Specifically, we first leverage two memory networks for mining the intra-modality information of text and image, and then design a discriminant matrix to supervise the fusion of inter-modality information. Experimental results demonstrate the effectiveness of the proposed model.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.


  1. 1.

    Hsu, W.Y., Hsu, H.H., Tseng, V.S.: Discovering negative comments by sentiment analysis on web forum. World Wide Web 22, 1297–1311 (2019)

    Article  Google Scholar 

  2. 2.

    Chauhan, U.A., Afzal, M.T., Shahid, A., Abdar, M., Basiri, M.E., Zhou, X.: A comprehensive analysis of adverb types for mining user sentiments on amazon product reviews. World Wide Web 23, 1811–1829 (2020)

    Article  Google Scholar 

  3. 3.

    Ouyang, Y., Guo, B., Zhang, J., Yu, Z., Zhou, X.: SentiStory: multi-grained sentiment analysis and event summarization with crowdsourced social media data. Pers. Ubiquit. Comput. 21(1), 97–111 (2017)

    Article  Google Scholar 

  4. 4.

    Yu, Z., Wang, Z., Chen, L., Guo, B., Li, W.: Featuring, detecting, and visualizing human sentiment in chinese micro-blog. ACM Trans. Knowl. Discov. Data 10(4), 1–23 (2016)

    Article  Google Scholar 

  5. 5.

    D. Yang, D. Zhang, Z. Yu, and Z. Wang. A sentiment-enhanced personalized location recommendation system. Proceedings of the 24th ACM conference on hypertext and social media, 119–128, 2013.

  6. 6.

    M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos, and S. Manandhar, Semeval-2014 task 4: Aspect based sentiment analysis. In: Proc. 8th Int. Workshop Semantic Eval. (SemEval), 2014, pp. 27–35.

  7. 7.

    Lai, Y., Zhang, L., Han, D., Wang, G.: Fine-grained emotion classification of Chinese microblogs based on graph convolution networks. World Wide Web 23, 2771–2787 (2020)

    Article  Google Scholar 

  8. 8.

    D-T Vo and Y Zhang (2015) Target-dependent twitter sentiment classification with rich automatic features. In: IJCAI. pp. 1347–1353.

  9. 9.

    Jiang, L., Yu, M., Zhou, M., Liu, X., Zhao, T.: Target-dependent twitter sentiment classification. ACL 1, 151–160 (2011)

    Google Scholar 

  10. 10.

    SM Mohammad, S Kiritchenko, and X Zhu. Nrc-canada: building the state-of-the-art in sentiment analysis of tweets. arXiv preprint

  11. 11.

    T. Luong, H. Pham, and C. D. Manning: Effective approaches to attention-based neural machine translation. In Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP). Lisbon, Portugal, Sep. 2015, pp. 1412–1421.

  12. 12.

    Feng, S., Wang, Y., Liu, L., Wang, D., Yu, G.: Attention based hierarchical LSTM network for context-aware microblog sentiment classification. World Wide Web 22, 59–81 (2019)

    Article  Google Scholar 

  13. 13.

    N. Xu, W. Mao, and G. Chen. Multi-interactive memory network for aspect based multimodal sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence. 33, 01 (2019), 371-378

  14. 14.

    J. Yu and J. Jiang. Adapting BERT for target-oriented multimodal sentiment classification. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence Main track. pp. 5408–5414.

  15. 15.

    Pang, Bo., Lee, L.: Opinion mining and sentiment analysis. Found. Trends R Inf. Retr. 2(1–2), 1–135 (2008)

    Article  Google Scholar 

  16. 16.

    Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)

    Article  Google Scholar 

  17. 17.

    V. Perez-Rosas, C. Banea, and R. Mihalcea. Learning sentiment lexicons in spanish. In: LREC. pp. 3077–3081, 2012.

  18. 18.

    S. Kiritchenko, X. Zhu, C. Cherry, and S. Mohammad. Nrc-canada-2014: detecting aspects and sentiment in customer reviews. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). pp. 437–442.

  19. 19.

    L. Dong, F. Wei, C. Tan, D. Tang, M. Zhou, and K. Xu. Adaptive recursive neural network for target dependent twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. pp. 49–54.

  20. 20.

    D. Tang, B. Qin, X. Feng, and T. Liu: Effective LSTMs for targetdependent sentiment classification. In: Proc. COLING 26th Int. Conf. Comput. Linguistics, Tech. Papers, 2016, pp. 3298–3307.

  21. 21.

    Y. Kim, C. Denton, L. Hoang, and A. M. Rush, ‘‘Structured attention networks,’’ Feb. 2017, Available

  22. 22.

    Y. Wang, M. Huang, and L. Zhao: Attention-based lstm for aspectlevel sentiment classification. In: Proc. Conf. Empirical Methods Natural Lang. Process., 2016, pp. 606–615.

  23. 23.

    D. Ma, S. Li, X. Zhang, and H. Wang: Interactive attention networks for aspect-level sentiment classification. In: Proc. IJCAI, 2017. pp. 4068–4074.

  24. 24.

    F. Fan, Y. Feng, and D. Zhao: Multi-grained attention network for aspect-level sentiment classification. In: Proc. Conf. Empirical Methods Natural Lang. Process., 2018, pp. 3433–3442.

  25. 25.

    Zeng, J., Ma, X., Zhou, K.: ‘Enhancing attention-based LSTM with position context for aspect-level sentiment classification.’ IEEE Access 7, 20462–20471 (2019)

    Article  Google Scholar 

  26. 26.

    Y. Ma, H. Peng, and E. Cambria: Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. In: Proc. AAAI, 2018, pp. 5876–5883.

  27. 27.

    R. He, W. S. Lee, H. T. Ng, and D. Dahlmeier. Exploiting document knowledge for aspect-level sentiment classification. In: Proc. 56th Annu. Meeting Assoc. Comput.Linguistics (ACL), Melbourne, VIC, Australia, vol. 2, Jul. 2018, pp. 579–585.

  28. 28.

    Min Wang, Donglin Cao, Lingxiao Li, Shaozi Li, and Rongrong Ji. Microblog sentiment analysis based on cross-media bag-of-words model. In: Proceedings of International Conference on Internet Multimedia Computing and Service (ICIMCS’14). Association for Computing Machinery, New York, NY, USA, 76–80.

  29. 29.

    Poria, S., Cambria, E., Howard, N., Huang, G.-B., Hussain, A.: Fusion audio, visual and teatual clues for sentiment analysis from multimodal content. Neurocomputing 2016(174), 5059 (2016).

    Article  Google Scholar 

  30. 30.

    Yu, Y., Lin, H., Meng, J., Zhao, Z.: Visual and textual sentiment analysis of a microblog using deep convolutional neural networks. Algorithms 9, 41 (2016)

    MathSciNet  Article  Google Scholar 

  31. 31.

    Zadeh A, Chen Minghai, Poria S, E. Cambria, and L.P. Morency. Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, Denmark, 2017.

  32. 32.

    Zadeh A, Liang P, Mazumder N, Poria S, Cambria E, and Morency P. Memory fusion network for multi-view sequential learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2018.

  33. 33.

    J. Pennington, R. Socher, and C. Manning (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). pp. 1532–1543.

  34. 34.

    J. Devlin, M. Chang, K. Lee, and K. Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding., 2018.

  35. 35.

    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In: CVPR. pp. 770–778, 2016.

  36. 36.

    C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2818–2826

  37. 37.

    Kingma, D. P., and Ba, J. (2014) Adam: a method for stochastic optimization. arXiv preprint

  38. 38.

    Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R. (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint

  39. 39.

    X. Glorot and Y. Bengio (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256.

  40. 40.

    D. Tang, B. Qin, and T. Liu: Aspect level sentiment classification with deep memory network. In: Proc. Conf. Empirical Methods Natural Lang. Process., 2016, pp. 214–224.

  41. 41.

    P. Chen, Z. Sun, L. Bing, and W. Yang: Recurrent attention network on memory for aspect sentiment analysis. In: Proc. Conf. Empirical Methods Natural Lang. Process., 2017, pp. 452–461.

  42. 42.

    Su, J., Tang, J., Jiang, H., Lu, Z., Ge, Y., Song, L., Xiong, D., Sun, L., Luo, J.: Enhanced aspect-based sentiment analysis models with progressive self-supervised attention learning. Artif. Intell. 296, 103477 (2021)

    MathSciNet  Article  Google Scholar 

Download references


This work is partially supported by the National Natural Science Foundation of China (No. 61960206008, 62072375), and the Fundamental Research Funds for the Central Universities (No. 3102019AX10).

Author information



Corresponding author

Correspondence to Zhu Wang.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Wang, Z., Li, X. et al. ModalNet: an aspect-level sentiment classification model by exploring multimodal data with fusion discriminant attentional network. World Wide Web (2021).

Download citation


  • Multimodal data
  • Aspect-level sentiment classification
  • Discriminant attention network
  • Feature fusion