Self attention mechanism of bidirectional information enhancement

Li, Qibin; Yao, Nianmin; Zhao, Jian; Zhang, Yanan

doi:10.1007/s10489-021-02492-2

Self attention mechanism of bidirectional information enhancement

Published: 16 June 2021

Volume 52, pages 2530–2538, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Qibin Li¹,
Nianmin Yao¹,
Jian Zhao² &
…
Yanan Zhang³

597 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Self attention mechanism is widely used in relation extraction, emotion classification and other tasks. It can extract a wide range of relevance information in the text. The attention mode of the existing self attention mechanism is soft attention mode, that is, a dense attention matrix is generated by softmax function. However, if the sentence length is long, the weight of important information will be too small. At the same time, the softmax function assumes that all elements have a positive impact on the results by default, which makes the model unable to extract the negative effect information. We use hard attention mechanism, namely sparse attention matrix, to improve the existing self attention model and fully extract the positive and negative information of text. Our model can not only enhance the extraction of positive information, but also makes up for the blank that the traditional attention matrix cannot be negative. We evaluated our model in three tasks and seven data sets. The experimental results show that our model is superior to the traditional self attention model and superior to state-of-the-art models in some tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

Article 07 January 2021

Transformer models for text-based emotion detection: a review of BERT-based approaches

Article 08 February 2021

TextConvoNet: a convolutional neural network based architecture for text classification

Article 22 October 2022

References

Williams A, Nangia N, Bowman SR (2018) A broad-coverage challenge corpus for sentence understanding through inference. In: NAACL
Warstadt A, Singh A, Bowman SR (2018) Neural network acceptability judgments. arXiv preprint arXiv:1805.12471
Wang A, Singh A, Michael J, Hill F, Levy O, Bowman S (2018) Glue: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and interpreting neural networks for NLP, pp 353–355
Vaswani A, Shazeer N, Parmar N, Jones L, Gomez AN, Kaiser Ł, Polosukhin I Attention is all you need. Advances in Neural Information Processing Systems 30 (NIPS 2017), pp 6000–6010
Dos Santos C, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, pp 69–78
dos Santos C, Tan M, Xiang B, Zhou B (2016) Attentive pooling networks. arXiv preprint arXiv:1602.03609
Zhang D, Wang D (2015) Relation classification via recurrent neural network. In: Proceedings of CoRR, arXiv:1508.01006
Correia GM, Niculae V, Martins AFT (2019) Adaptively sparse transformers. In: Proceedings of the 2019 conference on empirical methods in natural language processing and 9th international joint conference on natural language processing (EMNLP-IJCNLP)
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: proceedings of the 2015 conference on international conference on learning representations(ICLR)
Zeng D, Liu K, Lai S, Zhou G, Zhao J (2014) Relation classification via convolutional deep neural network. In: Proceedings of International Conference on Computational Linguistics (COLING)
Hill F, Cho K, Korhonen A (2016) Learning distributed representations of sentences from unlabelled data. In: Proceedings of the conference of the North American chapter of the association for computational linguistics: human language technologies . Association for Computational Linguistics, San Diego, California, pp 1367–1377
Xu H, Gao T, Yao Y, Ye D, Liu Z, Maosong S (2019) In: Proceedings of EMNLP-IJCNLP, System Demonstrations
Hendrickx I, Kim SN, Kozareva Z, Nakov P, Séaghdha DÓ, Padó S, Pennacchiotti M, Romano L, Szpakowicz S (2009) Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In: proceedings of the Workshop on SemEval
Bilan I, Roth B (2018) Position-aware self-attention with relative positional encodings for slot filling. In: Proceedings of CoRR, arXiv:1807.03052
Du J, Han J, Way A, Wan D (2018) Multi-level structured self-attentions for distantly supervised relation extraction. In: Proceedings of the 2018 conference on empirical methods in natural language processing(EMNLP)
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 1556–1566
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell, Neural image caption generation with visual attention. In: Proceedings of international conference on machine learning (ICML)
Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of SIGMOD
Mou L, Peng H, Ge L, Yan X, Zhang L, Jin Z (2015) Discriminative neural sentence modeling by tree-based convolution. In: Proceedings of the 2015 conference on empirical methods in natural language processing(EMNLP), pp 2315–2325
Wang L, Cao Z, de Melo G, Liu Z (2016) Relation classification via multi-level attention cnns. In: Proceedings of the 54th annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 1298–1307
Soares LB, FitzGerald N, Ling J, Kwiatkowski T (2019) Matching the blanks, Distributional similarity for relation learning. In: Proceedings of ACL
Bentivogli L, Magnini B, Dagan I, Dang HT, Giampiccolo D (2009) The fifth PASCAL recognizing textual entailment challenge. In: TAC. NIST
Mintz M, Bills S, Snow R, Dan J (2009) Distant Supervision for relation extraction without labeled data. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: pp 2,1003–1011
Ma M, Huang L, Xiang B, Zhou B (2015) Dependency-based convolutional neural networks for sentence embedding. In: proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, vol 2, pp 174– 179
Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network formodelling sentences. arXiv preprint arXiv:1404.2188
Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Attentionbased bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th annual meeting of the association for computational linguistics,2, pp 207–212
Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) Squad: 100 000+ questions for machine comprehension of text. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 2383–2392
Le QV, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of international conference on machine learning (ICML), vol 14, pp 1188–1196
Hoffmann R, Zhang C, Ling X, Zettlemoyer L, Weld DS (2011) Knowledge-based weak supervision for information extraction of overlapping relations. In: Proceedings of the 49th ACL-HLT, pp 541–550
Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642
Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in neural information processing systems, pp 3294–3302
Sukhbaatar S, Grave E, Bojanowski P, Joulin A (2019) Adaptive attention span in transformers. In: Proc ACL
Riedel S, Yao L, Andrew M (2010) Modeling relations and their mentions without labeled text. In: Proceedings of the 2010 European conference on machine learning and knowledge discovery in databases, Part III, pp 148–163
Tao S, Tianyi Z, Guodong L, Jing J, Sen W, Chengqi Z (2018) Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling. arXiv preprint arXiv:1801.10296
Zhang S, Zheng D, Hu X, Yang M (2015) Bidirectional long short-term memory networks for relation classification. In: Proceedings of PACLIC
Vashishth S, Joshi R, Prayaga SS, Bhattacharyya C, Talukdar P (2018) Improving distantly-supervised neural relation extraction using side information. In: Proceedings of the 2018 conference on empirical methods in natural language processing
Lin Y, Shen S, Liu Z, Luan H, Sun M (2016) Neural relation extraction with selective attention over instances. In: Proceedings of the 54th ACL
Lin Y, Liu Z, Sun M (2017) Neural relation extraction with multi-lingual attention. In: Proceedings of the 55th ACL, pp 34–43
Ye Z-X, Ling Z-H (2019) Improving distantly-supervised neural relation extraction using side information. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics, pp 2810–2819
Kim Y (2014) Convolutional neural networks for sentence classification. In: Inproceedings of the conference on empirical methods in natural language processing (EMNLP).ngzhuyi

Download references

Acknowledgements

This work was supported by the Innovation Foundation of Science and Technology of Dalian under Grant No.2018J12GX045 and National Key R&D Program of China under Grant No.2018AAA0100300.

Author information

Authors and Affiliations

The Dalian University of Technology, DaLian, China
Qibin Li & Nianmin Yao
School of Automotive Engineering, Dalian University of Technology, DaLian, China
Jian Zhao
Automotive Data of China (Tianjin) Co.,Ltd, Tianjin, China
Yanan Zhang

Authors

Qibin Li
View author publications
You can also search for this author in PubMed Google Scholar
Nianmin Yao
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yanan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nianmin Yao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Q., Yao, N., Zhao, J. et al. Self attention mechanism of bidirectional information enhancement. Appl Intell 52, 2530–2538 (2022). https://doi.org/10.1007/s10489-021-02492-2

Download citation

Accepted: 30 April 2021
Published: 16 June 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10489-021-02492-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self attention mechanism of bidirectional information enhancement

Abstract

Access this article

Similar content being viewed by others

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

Transformer models for text-based emotion detection: a review of BERT-based approaches

TextConvoNet: a convolutional neural network based architecture for text classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Self attention mechanism of bidirectional information enhancement

Abstract

Access this article

Similar content being viewed by others

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

Transformer models for text-based emotion detection: a review of BERT-based approaches

TextConvoNet: a convolutional neural network based architecture for text classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation