Improving Short Text Modeling by Two-Level Attention Networks for Sentiment Classification

  • Yulong Li
  • Yi CaiEmail author
  • Ho-fung Leung
  • Qing Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10827)


Understanding short texts is crucial to many applications, but it has always been challenging, due to the sparsity and ambiguity of information in short texts. In addition, sentiments expressed in those user-generated short texts are often implicit and context dependent. To address this, we propose a novel model based on two-level attention networks to identify the sentiment of short text. Our model first adopts attention mechanism to capture both local features and long-distance dependent features simultaneously, so that it is more robust against irrelevant information. Then the attention-based features are non-linearly combined with a bidirectional recurrent attention network, which enhances the expressive power of our model and automatically captures more relevant feature combinations. We evaluate the performance of our model on MR, SST-1 and SST-2 datasets. The experimental results show that our model can outperform the previous methods.



This work is supported by the Fundamental Research Funds for the Central Universities, SCUT (NO. 2017ZD0482015ZM136), Tiptop Scientific and Technical Innovative Youth Talents of Guangdong special support program(No. 2015TQ01X633), Science and Technology Planning Project of Guangdong Province, China (No. 2016A030310423), Science and Technology Program of Guangzhou (International Science and Technology Cooperation Program No. 201704030076 and Science and Technology Planning Major Project of Guangdong Province (No. 2015A070711001). This work presented in this paper was also partially supported by a CUHK Direct Grant for Research (Project Code EE16963).


  1. 1.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. Comput. Sci. (2014)Google Scholar
  2. 2.
    Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)CrossRefGoogle Scholar
  3. 3.
    Cai, R., Zhang, X., Wang, H.: Bidirectional recurrent convolutional neural network for relation classification. In: Meeting of the Association for, Computational Linguistics, pp. 756–765 (2016)Google Scholar
  4. 4.
    Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
  5. 5.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  6. 6.
    Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014)
  7. 7.
    Kim, Y.: Convolutional neural networks for sentence classification. Eprint Arxiv (2014)Google Scholar
  8. 8.
    Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. AAAI 333, 2267–2273 (2015)Google Scholar
  9. 9.
    Le, P., Zuidema, W.: Compositional distributional semantics with long short term memory. arXiv preprint arXiv:1503.02510 (2015)
  10. 10.
    Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, vol. 14, pp. 1188–1196 (2014)Google Scholar
  11. 11.
    Li, J., Luong, M.-T., Jurafsky, D., Hovy, E.: When are tree structures necessary for deep learning of representations? arXiv preprint arXiv:1503.00185 (2015)
  12. 12.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  13. 13.
    Mikolov, T., Karafit, M., Burget, L., Cernock, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH 2010, Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, pp. 1045–1048, September 2010Google Scholar
  14. 14.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  15. 15.
    Mou, L., Peng, H., Li, G., Xu, Y., Zhang, L., Jin, Z.: Discriminative neural sentence modeling by tree-based convolution. arXiv preprint arXiv:1504.01106 (2015)
  16. 16.
    Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)Google Scholar
  17. 17.
    Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)CrossRefGoogle Scholar
  18. 18.
    Socher, R., Pennington, J., Huang, E.H., Ng, A.Y., Manning, C.D.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27–31 July 2011, John Mcintyre Conference Centre, Edinburgh, UK, A Meeting of SIGDAT, A Special Interest Group of the ACL, pp. 151–161 (2011)Google Scholar
  19. 19.
    Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 1631, p. 1642. Citeseer (2013)Google Scholar
  20. 20.
    Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075 (2015)
  21. 21.
    Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)Google Scholar
  22. 22.
    Wang, H.: Understanding short texts (2013)Google Scholar
  23. 23.
    Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. Comput. Sci. 2048–2057 (2015)Google Scholar
  24. 24.
    Zhang, Y., Er, M.J., Wang, N., Pratama, M.: Sentiment classification using comprehensive attention recurrent models. In: International Joint Conference on Neural Networks, (2016)Google Scholar
  25. 25.
    Zeiler, M.D.: ADADELTA: an adaptive learning rate method. Comput. Sci. (2012)Google Scholar
  26. 26.
    Zhu, X., Sobihani, P., Guo, H.: Long short-term memory over recursive structures. In: International Conference on Machine Learning, pp. 1604–1612 (2015)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Software EngineeringSouth China University of TechnologyGuangzhouChina
  2. 2.Department of Computer Science and EngineeringThe Chinese University of Hong KongSha TinHong Kong
  3. 3.Department of Computer ScienceCity University of Hong KongKowloon TongHong Kong

Personalised recommendations