Skip to main content
Log in

Self attention mechanism of bidirectional information enhancement

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Self attention mechanism is widely used in relation extraction, emotion classification and other tasks. It can extract a wide range of relevance information in the text. The attention mode of the existing self attention mechanism is soft attention mode, that is, a dense attention matrix is generated by softmax function. However, if the sentence length is long, the weight of important information will be too small. At the same time, the softmax function assumes that all elements have a positive impact on the results by default, which makes the model unable to extract the negative effect information. We use hard attention mechanism, namely sparse attention matrix, to improve the existing self attention model and fully extract the positive and negative information of text. Our model can not only enhance the extraction of positive information, but also makes up for the blank that the traditional attention matrix cannot be negative. We evaluated our model in three tasks and seven data sets. The experimental results show that our model is superior to the traditional self attention model and superior to state-of-the-art models in some tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Williams A, Nangia N, Bowman SR (2018) A broad-coverage challenge corpus for sentence understanding through inference. In: NAACL

  2. Warstadt A, Singh A, Bowman SR (2018) Neural network acceptability judgments. arXiv preprint arXiv:1805.12471

  3. Wang A, Singh A, Michael J, Hill F, Levy O, Bowman S (2018) Glue: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and interpreting neural networks for NLP, pp 353–355

  4. Vaswani A, Shazeer N, Parmar N, Jones L, Gomez AN, Kaiser Ł, Polosukhin I Attention is all you need. Advances in Neural Information Processing Systems 30 (NIPS 2017), pp 6000–6010

  5. Dos Santos C, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, pp 69–78

  6. dos Santos C, Tan M, Xiang B, Zhou B (2016) Attentive pooling networks. arXiv preprint arXiv:1602.03609

  7. Zhang D, Wang D (2015) Relation classification via recurrent neural network. In: Proceedings of CoRR, arXiv:1508.01006

  8. Correia GM, Niculae V, Martins AFT (2019) Adaptively sparse transformers. In: Proceedings of the 2019 conference on empirical methods in natural language processing and 9th international joint conference on natural language processing (EMNLP-IJCNLP)

  9. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: proceedings of the 2015 conference on international conference on learning representations(ICLR)

  10. Zeng D, Liu K, Lai S, Zhou G, Zhao J (2014) Relation classification via convolutional deep neural network. In: Proceedings of International Conference on Computational Linguistics (COLING)

  11. Hill F, Cho K, Korhonen A (2016) Learning distributed representations of sentences from unlabelled data. In: Proceedings of the conference of the North American chapter of the association for computational linguistics: human language technologies . Association for Computational Linguistics, San Diego, California, pp 1367–1377

  12. Xu H, Gao T, Yao Y, Ye D, Liu Z, Maosong S (2019) In: Proceedings of EMNLP-IJCNLP, System Demonstrations

  13. Hendrickx I, Kim SN, Kozareva Z, Nakov P, Séaghdha DÓ, Padó S, Pennacchiotti M, Romano L, Szpakowicz S (2009) Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In: proceedings of the Workshop on SemEval

  14. Bilan I, Roth B (2018) Position-aware self-attention with relative positional encodings for slot filling. In: Proceedings of CoRR, arXiv:1807.03052

  15. Du J, Han J, Way A, Wan D (2018) Multi-level structured self-attentions for distantly supervised relation extraction. In: Proceedings of the 2018 conference on empirical methods in natural language processing(EMNLP)

  16. Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 1556–1566

  17. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell, Neural image caption generation with visual attention. In: Proceedings of international conference on machine learning (ICML)

  18. Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of SIGMOD

  19. Mou L, Peng H, Ge L, Yan X, Zhang L, Jin Z (2015) Discriminative neural sentence modeling by tree-based convolution. In: Proceedings of the 2015 conference on empirical methods in natural language processing(EMNLP), pp 2315–2325

  20. Wang L, Cao Z, de Melo G, Liu Z (2016) Relation classification via multi-level attention cnns. In: Proceedings of the 54th annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 1298–1307

  21. Soares LB, FitzGerald N, Ling J, Kwiatkowski T (2019) Matching the blanks, Distributional similarity for relation learning. In: Proceedings of ACL

  22. Bentivogli L, Magnini B, Dagan I, Dang HT, Giampiccolo D (2009) The fifth PASCAL recognizing textual entailment challenge. In: TAC. NIST

  23. Mintz M, Bills S, Snow R, Dan J (2009) Distant Supervision for relation extraction without labeled data. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: pp 2,1003–1011

  24. Ma M, Huang L, Xiang B, Zhou B (2015) Dependency-based convolutional neural networks for sentence embedding. In: proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, vol 2, pp 174– 179

  25. Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network formodelling sentences. arXiv preprint arXiv:1404.2188

  26. Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Attentionbased bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th annual meeting of the association for computational linguistics,2, pp 207–212

  27. Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) Squad: 100 000+ questions for machine comprehension of text. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 2383–2392

  28. Le QV, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of international conference on machine learning (ICML), vol 14, pp 1188–1196

  29. Hoffmann R, Zhang C, Ling X, Zettlemoyer L, Weld DS (2011) Knowledge-based weak supervision for information extraction of overlapping relations. In: Proceedings of the 49th ACL-HLT, pp 541–550

  30. Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642

  31. Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in neural information processing systems, pp 3294–3302

  32. Sukhbaatar S, Grave E, Bojanowski P, Joulin A (2019) Adaptive attention span in transformers. In: Proc ACL

  33. Riedel S, Yao L, Andrew M (2010) Modeling relations and their mentions without labeled text. In: Proceedings of the 2010 European conference on machine learning and knowledge discovery in databases, Part III, pp 148–163

  34. Tao S, Tianyi Z, Guodong L, Jing J, Sen W, Chengqi Z (2018) Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling. arXiv preprint arXiv:1801.10296

  35. Zhang S, Zheng D, Hu X, Yang M (2015) Bidirectional long short-term memory networks for relation classification. In: Proceedings of PACLIC

  36. Vashishth S, Joshi R, Prayaga SS, Bhattacharyya C, Talukdar P (2018) Improving distantly-supervised neural relation extraction using side information. In: Proceedings of the 2018 conference on empirical methods in natural language processing

  37. Lin Y, Shen S, Liu Z, Luan H, Sun M (2016) Neural relation extraction with selective attention over instances. In: Proceedings of the 54th ACL

  38. Lin Y, Liu Z, Sun M (2017) Neural relation extraction with multi-lingual attention. In: Proceedings of the 55th ACL, pp 34–43

  39. Ye Z-X, Ling Z-H (2019) Improving distantly-supervised neural relation extraction using side information. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics, pp 2810–2819

  40. Kim Y (2014) Convolutional neural networks for sentence classification. In: Inproceedings of the conference on empirical methods in natural language processing (EMNLP).ngzhuyi

Download references

Acknowledgements

This work was supported by the Innovation Foundation of Science and Technology of Dalian under Grant No.2018J12GX045 and National Key R&D Program of China under Grant No.2018AAA0100300.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nianmin Yao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Q., Yao, N., Zhao, J. et al. Self attention mechanism of bidirectional information enhancement. Appl Intell 52, 2530–2538 (2022). https://doi.org/10.1007/s10489-021-02492-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02492-2

Keywords

Navigation