Advertisement

Enhancing sentence embedding with dynamic interaction

  • Jinsong Xie
  • Yongjun LiEmail author
  • Qiwei Sun
  • Yi Lin
Article
  • 38 Downloads

Abstract

Sentence embedding is a powerful tool in many natural language processing subfields, such as sentiment analysis, natural language inference and questions classification. However, previous work just integrates the final states, which are the output of encoder of multiple-layer architecture, with average pooling or max pooling as the final sentence representation. Average pooling is simple and fast for summarizing the overall meaning of sentences, but it may ignore some significant latent semantic features considering that information is flowing through the multiple layers. In this paper, we propose a new dynamic interaction method for improving the final sentence representation. It aims to make the states of the last layer more conducive to the next classification layer by introducing some constraint from the states of the previous layers. The constraint is the product of dynamic interaction between states of intermediate layers and states of the upper-most layer. Experiments can surpass prior state-of-the-art sentence embedding methods on 4 datasets.

Keywords

Sentence embedding Sentiment analysis Self-attention Deep neural networks 

Notes

Acknowledgments

We thank the support from National Natural Science Foundation of China (Nos. 11771152); Science and Technology Foundation of Guangdong Province (Nos. 2015B010128008 & 2015B010109006).

References

  1. 1.
    Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. In: EMNLP, pp 632–642Google Scholar
  2. 2.
    Chen Q, Ling Z-H, Zhu X (2018) Enhancing sentence embedding with generalized pooling. In: COLING, pp 1815–1826Google Scholar
  3. 3.
    Chen Q, Zhu X, Ling Z-H, Wei S, Jiang H, Inkpen D (2017) Enhanced LSTM for natural language inference. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: long papers, vol 1. Association for Computational Linguistics, pp 1657–1668Google Scholar
  4. 4.
    Conneau A, Kiela D, Schwenk H, LBarrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data. In: EMNLP, pp 670–680Google Scholar
  5. 5.
    Ding Z, Xia R, Yu J, Li X, Yang J (2018) Densely connected bidirectional LSTM with applications to sentence classification. In: NLPCC, pp 278–287Google Scholar
  6. 6.
    Gao J, Duh K, Liu X, Shen Y (2018) Stochastic answer networks for machine reading comprehension. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: long papers, vol 1. Association for Computational Linguistics, pp 1694–1704Google Scholar
  7. 7.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778Google Scholar
  8. 8.
    He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: ECCV, pp 630–645Google Scholar
  9. 9.
    Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRefGoogle Scholar
  10. 10.
    Hu M, Liu B (2014) Mining and summarizing customer reviews In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 168–177Google Scholar
  11. 11.
    Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: CVPR, pp 2261–2269Google Scholar
  12. 12.
    Irsoy O, Cardie C (2014) Deep recursive neural networks for compositionality in language. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, pp 2096–2104Google Scholar
  13. 13.
    Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: long papers, vol 1. The Association for Computer Linguistics, pp 655–665Google Scholar
  14. 14.
    Kim S, Hong J-H, Kang I, Kwak N (2018) Semantic sentence matching with densely-connected recurrent and co-attentive information. arXiv:180511360
  15. 15.
    Kim Y (2014) Convolutional neural networks for sentence classification. In: EMNLP, pp 1746–1751Google Scholar
  16. 16.
    Kiros R, Zhu Y, Salakhutdinov R, Zemel RS, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, pp 3294– 3302Google Scholar
  17. 17.
    Li X, Roth D (2002) Learning question classifiers. In: COLINGGoogle Scholar
  18. 18.
    Lin Z, Feng M, dos Santos CN, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv:170303130
  19. 19.
    Liu Y, Sun C, Lin L, Wang X (2016) Learning natural language inference using bidirectional LSTM model and inner-attention. arXiv:160509090
  20. 20.
    Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems, pp 3111–3119Google Scholar
  21. 21.
    Mou L, Men R, Li G, Xu Y, Zhang L, Yan R, Jin Z (2016) Natural language inference by tree-based convolution and heuristic matching. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics: short papers, vol 2. The Association for Computer LinguisticsGoogle Scholar
  22. 22.
    Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, pp 271–278Google Scholar
  23. 23.
    Pang B, Lee L (2005) Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of 43rd Annual Meeting of the Association for Computational Linguistics, pp 115–124Google Scholar
  24. 24.
    Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp 1532–1543Google Scholar
  25. 25.
    Qian Q, Huang M, Zhu X (2016) Linguistically regularized lstms for sentiment classification. arXiv:161103949
  26. 26.
    Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp 3859–3869Google Scholar
  27. 27.
    Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. Trans Signal Process 45(11):2673–2681CrossRefGoogle Scholar
  28. 28.
    Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) Disan: Directional self-attention network for rnn/cnn-free language understanding. In: Proceedings of the Thirty-Second AAAI Conference on Artificial IntelligenceGoogle Scholar
  29. 29.
    Socher R, Lin CC-Y, Ng AY, Manning CD (2011) Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th International Conference on Machine L earning, pp 129–136Google Scholar
  30. 30.
    Liu T, Yu S, Xu B, Yin H (2018) Recurrent networks with attention and convolutional networks for sentence representation and classification. Appl Intell 48:3797–3806CrossRefGoogle Scholar
  31. 31.
    Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, pp 1631–1642Google Scholar
  32. 32.
    Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, pp 1631–1642Google Scholar
  33. 33.
    Srivastava RK, Greff K, Schmidhuber J (2015) Highway networks. arXiv:150500387
  34. 34.
    Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: long papers, vol 1. The Association for Computer Linguistics, pp 1556–1566Google Scholar
  35. 35.
    Turian JP, Ratinov L-A, Bengio Y (2010) Word representations: A simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp 384–394Google Scholar
  36. 36.
    Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp 6000–6010Google Scholar
  37. 37.
    Yang Z, Yang D, Dyer C, He X, Smola AJ, Hovy EH (2016) Hierarchical attention networks for document classification. In: NAACL, pp 1480–1489Google Scholar
  38. 38.
    Yoon D, Lee D, Lee S (2018) Dynamic self-attention : Computing attention over words dynamically for sentence embedding. arXiv:180807383
  39. 39.
    Zhang X, Zhao JJ, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, pp 649–657Google Scholar
  40. 40.
    Zhou P, Qi Z, Zheng S, Xu J, Bao H, Xu B (2016) Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In: COLING, pp 3485–3495Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Computer Science and EngineeringSouth China University of TechnologyGuangzhouChina

Personalised recommendations