Skip to main content
Log in

Recurrent networks with attention and convolutional networks for sentence representation and classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In this paper, we propose a bi-attention, a multi-layer attention and an attention mechanism and convolution neural network based text representation and classification model (ACNN). The bi-attention have two attention mechanism to learn two context vectors, forward RNN with attention to learn forward context vector \(\overrightarrow {\mathbf {c}}\) and backward RNN with attention to learn backward context vector \(\overleftarrow {\mathbf {c}}\), and then concatenation \(\overrightarrow {\mathbf {c}}\) and \(\overleftarrow {\mathbf {c}}\) to get context vector c. The multi-layer attention is the stack of the bi-attention. In the ACNN, the context vector c is obtained by the bi-attention, then the convolution operation is performed on the context vector c, and the max-pooling operation is used to reduce the dimension. After max-pooling operation the text is converted to low-dimensional sentence vector m. Finally, the Softmax classifier be used for text classification. We test our model on 8 benchmarks text classification datasets, and our model achieved a better or the same performance compare with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Arora S, Liang Y, Ma T (2017) A simple but tough-to-beat baseline for sentence embeddings

  2. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:14090473

  3. Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

    MATH  Google Scholar 

  4. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:14061078

  5. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537

    MATH  Google Scholar 

  6. Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data. arXiv:170502364

  7. Dai AM, Le QV (2015) Semi-supervised sequence learning. In: Advances in neural information processing systems, pp 3079–3087

  8. Hill F, Cho K, Korhonen A (2016) Learning distributed representations of sentences from unlabelled data. arXiv:160203483

  9. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  10. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 168–177

  11. Huang E, Socher R, Manning C, Ng A (2012) Improving word representations via global context and multiple word prototypes. In: ACL. ACL, pp 873–882

  12. Johnson R, Zhang T (2014) Effective use of word order for text categorization with convolutional neural networks. arXiv:14121058

  13. Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. arXiv:14042188

  14. Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:14085882

  15. Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv:14126980

  16. Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in neural information processing systems, pp 3294–3302

  17. Lai A, Hockenmaier J (2014) Illinois-lh: a denotational and distributional approach to semantics. In: SemEval@ COLING, pp 329–334

  18. Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: AAAI, vol 333, pp 2267–2273

  19. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31st international conference on machine learning (ICML-14), pp 1188–1196

  20. Li X, Roth D (2002) Learning question classifiers. In: COLING. ACL, pp 1–7

  21. Lin Z, Feng M, Santos CNd, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv:170303130

  22. Maas A, Daly R, Pham P, Huang D, Ng A, Potts C (2011) Learning word vectors for sentiment analysis. In: ACL. ACL, pp 142–150

  23. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  24. Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv:13013781

  25. Mnih A, Hinton G (2007) Three new graphical models for statistical language modelling. In: Proceedings of the 24th international conference on machine learning. ACM, pp 641–648

  26. Mnih V, Heess N, Graves A et al (2014) Recurrent models of visual attention. In: Advances in neural information processing systems, pp 2204–2212

  27. Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: ACL. ACL, p 271

  28. Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: ACL. ACL, pp 115–124

  29. Pang B, Lee L et al (2008) Opinion mining and sentiment analysis. Foundations and Trends®;, in Information Retrieval 2(1–2):1–135

    Article  Google Scholar 

  30. Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A Bayesian approach to filtering junk e-mail. In: Learning for text categorization: papers from the 1998 workshop, vol 62, pp 98–105

  31. Socher R, Perelygin A, Wu JY, Chuang J, Manning CD, Ng AY, Potts C et al (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), vol 1631, pp 1642

  32. Srivastava N, Hinton G E, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  33. Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. arXiv:150300075

  34. Tang D, Qin B, Liu T (2015) Document modeling with gated recurrent neural network for sentiment classification. In: EMNLP, pp 1422–1432

  35. Wang S, Manning C (2013) Fast dropout training. In: ICML, pp 118–126

  36. Wang S, Manning CD (2012) Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual meeting of the association for computational linguistics: short papers, vol 2. Association for Computational Linguistics, pp 90–94

  37. Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. Lang Resour Eval 39(2):165–210

    Article  Google Scholar 

  38. Wieting J, Bansal M, Gimpel K, Livescu K (2015) Towards universal paraphrastic sentence embeddings. arXiv:151108198

  39. Yang Z, Yang D, Dyer C, He X, Smola AJ, Hovy EH (2016) Hierarchical attention networks for document classification. In: HLT-NAACL, pp 1480–1489

  40. Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657

  41. Zhao H, Lu Z, Poupart P (2015) Self-adaptive hierarchical sentence model. In: IJCAI, pp 4069–4076

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China [NSFC61572005], the Fundamental Research Funds for the Central Universities [2016JBM080], and Key Projects of Science and Technology Research of Hebei Province Higher Education [ZD2-017304].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tengfei Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, T., Yu, S., Xu, B. et al. Recurrent networks with attention and convolutional networks for sentence representation and classification. Appl Intell 48, 3797–3806 (2018). https://doi.org/10.1007/s10489-018-1176-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1176-4

Keywords

Navigation