Abstract
Self attention mechanism is widely used in relation extraction, emotion classification and other tasks. It can extract a wide range of relevance information in the text. The attention mode of the existing self attention mechanism is soft attention mode, that is, a dense attention matrix is generated by softmax function. However, if the sentence length is long, the weight of important information will be too small. At the same time, the softmax function assumes that all elements have a positive impact on the results by default, which makes the model unable to extract the negative effect information. We use hard attention mechanism, namely sparse attention matrix, to improve the existing self attention model and fully extract the positive and negative information of text. Our model can not only enhance the extraction of positive information, but also makes up for the blank that the traditional attention matrix cannot be negative. We evaluated our model in three tasks and seven data sets. The experimental results show that our model is superior to the traditional self attention model and superior to state-of-the-art models in some tasks.
Similar content being viewed by others
References
Williams A, Nangia N, Bowman SR (2018) A broad-coverage challenge corpus for sentence understanding through inference. In: NAACL
Warstadt A, Singh A, Bowman SR (2018) Neural network acceptability judgments. arXiv preprint arXiv:1805.12471
Wang A, Singh A, Michael J, Hill F, Levy O, Bowman S (2018) Glue: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and interpreting neural networks for NLP, pp 353–355
Vaswani A, Shazeer N, Parmar N, Jones L, Gomez AN, Kaiser Ł, Polosukhin I Attention is all you need. Advances in Neural Information Processing Systems 30 (NIPS 2017), pp 6000–6010
Dos Santos C, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, pp 69–78
dos Santos C, Tan M, Xiang B, Zhou B (2016) Attentive pooling networks. arXiv preprint arXiv:1602.03609
Zhang D, Wang D (2015) Relation classification via recurrent neural network. In: Proceedings of CoRR, arXiv:1508.01006
Correia GM, Niculae V, Martins AFT (2019) Adaptively sparse transformers. In: Proceedings of the 2019 conference on empirical methods in natural language processing and 9th international joint conference on natural language processing (EMNLP-IJCNLP)
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: proceedings of the 2015 conference on international conference on learning representations(ICLR)
Zeng D, Liu K, Lai S, Zhou G, Zhao J (2014) Relation classification via convolutional deep neural network. In: Proceedings of International Conference on Computational Linguistics (COLING)
Hill F, Cho K, Korhonen A (2016) Learning distributed representations of sentences from unlabelled data. In: Proceedings of the conference of the North American chapter of the association for computational linguistics: human language technologies . Association for Computational Linguistics, San Diego, California, pp 1367–1377
Xu H, Gao T, Yao Y, Ye D, Liu Z, Maosong S (2019) In: Proceedings of EMNLP-IJCNLP, System Demonstrations
Hendrickx I, Kim SN, Kozareva Z, Nakov P, Séaghdha DÓ, Padó S, Pennacchiotti M, Romano L, Szpakowicz S (2009) Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In: proceedings of the Workshop on SemEval
Bilan I, Roth B (2018) Position-aware self-attention with relative positional encodings for slot filling. In: Proceedings of CoRR, arXiv:1807.03052
Du J, Han J, Way A, Wan D (2018) Multi-level structured self-attentions for distantly supervised relation extraction. In: Proceedings of the 2018 conference on empirical methods in natural language processing(EMNLP)
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 1556–1566
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell, Neural image caption generation with visual attention. In: Proceedings of international conference on machine learning (ICML)
Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of SIGMOD
Mou L, Peng H, Ge L, Yan X, Zhang L, Jin Z (2015) Discriminative neural sentence modeling by tree-based convolution. In: Proceedings of the 2015 conference on empirical methods in natural language processing(EMNLP), pp 2315–2325
Wang L, Cao Z, de Melo G, Liu Z (2016) Relation classification via multi-level attention cnns. In: Proceedings of the 54th annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 1298–1307
Soares LB, FitzGerald N, Ling J, Kwiatkowski T (2019) Matching the blanks, Distributional similarity for relation learning. In: Proceedings of ACL
Bentivogli L, Magnini B, Dagan I, Dang HT, Giampiccolo D (2009) The fifth PASCAL recognizing textual entailment challenge. In: TAC. NIST
Mintz M, Bills S, Snow R, Dan J (2009) Distant Supervision for relation extraction without labeled data. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: pp 2,1003–1011
Ma M, Huang L, Xiang B, Zhou B (2015) Dependency-based convolutional neural networks for sentence embedding. In: proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, vol 2, pp 174– 179
Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network formodelling sentences. arXiv preprint arXiv:1404.2188
Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Attentionbased bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th annual meeting of the association for computational linguistics,2, pp 207–212
Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) Squad: 100 000+ questions for machine comprehension of text. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 2383–2392
Le QV, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of international conference on machine learning (ICML), vol 14, pp 1188–1196
Hoffmann R, Zhang C, Ling X, Zettlemoyer L, Weld DS (2011) Knowledge-based weak supervision for information extraction of overlapping relations. In: Proceedings of the 49th ACL-HLT, pp 541–550
Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642
Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in neural information processing systems, pp 3294–3302
Sukhbaatar S, Grave E, Bojanowski P, Joulin A (2019) Adaptive attention span in transformers. In: Proc ACL
Riedel S, Yao L, Andrew M (2010) Modeling relations and their mentions without labeled text. In: Proceedings of the 2010 European conference on machine learning and knowledge discovery in databases, Part III, pp 148–163
Tao S, Tianyi Z, Guodong L, Jing J, Sen W, Chengqi Z (2018) Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling. arXiv preprint arXiv:1801.10296
Zhang S, Zheng D, Hu X, Yang M (2015) Bidirectional long short-term memory networks for relation classification. In: Proceedings of PACLIC
Vashishth S, Joshi R, Prayaga SS, Bhattacharyya C, Talukdar P (2018) Improving distantly-supervised neural relation extraction using side information. In: Proceedings of the 2018 conference on empirical methods in natural language processing
Lin Y, Shen S, Liu Z, Luan H, Sun M (2016) Neural relation extraction with selective attention over instances. In: Proceedings of the 54th ACL
Lin Y, Liu Z, Sun M (2017) Neural relation extraction with multi-lingual attention. In: Proceedings of the 55th ACL, pp 34–43
Ye Z-X, Ling Z-H (2019) Improving distantly-supervised neural relation extraction using side information. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics, pp 2810–2819
Kim Y (2014) Convolutional neural networks for sentence classification. In: Inproceedings of the conference on empirical methods in natural language processing (EMNLP).ngzhuyi
Acknowledgements
This work was supported by the Innovation Foundation of Science and Technology of Dalian under Grant No.2018J12GX045 and National Key R&D Program of China under Grant No.2018AAA0100300.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, Q., Yao, N., Zhao, J. et al. Self attention mechanism of bidirectional information enhancement. Appl Intell 52, 2530–2538 (2022). https://doi.org/10.1007/s10489-021-02492-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02492-2