BertHANK: hierarchical attention networks with enhanced knowledge and pre-trained model for answer selection

Yang, Haitian; Zhao, Xuan; Wang, Yan; Sun, Degang; Chen, Wei; Huang, Weiqing

doi:10.1007/s10115-022-01703-7

BertHANK: hierarchical attention networks with enhanced knowledge and pre-trained model for answer selection

Regular Paper
Published: 29 June 2022

Volume 64, pages 2189–2213, (2022)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Haitian Yang^1,2,
Xuan Zhao³,
Yan Wang^1,2,
Degang Sun^1,2,
Wei Chen⁴ &
…
Weiqing Huang^1,2

315 Accesses
1 Altmetric
Explore all metrics

Abstract

In recent years, Community Question Answering (CQA) becomes increasingly prevalent, because it provides platforms for users to collect information and share knowledge. However, given a question in a CQA system, there are often many different paired answers. It is almost impossible for users to view them item by item and select the most relevant one. Hence, answer selection becomes an important task of CQA. In this paper, we propose a novel solution - BertHANK, which is a hierarchical attention networks with enhanced knowledge and pre-trained model for answer selection. Specifically, in the encoding stage, knowledge enhancement and pre-training model are used for questions and answers, respectively. Further, we adopt multi-attention mechanism, including the cross-attention on question-answer pairs, the inner attention on questions at word level, and the hierarchical inner attention on answers at both word and sentence level, to capture more subtle semantic features. In more details, the cross-attention focuses on capturing interactive information among encoded questions and answers. While the hierarchical inner attention assigns different weights to words in sentences, and sentences in answers, thereby obtaining both global and local information of question-answer pairs. The hierarchical inner attention contributes to select out best-matched answers for specific questions. Finally, we integrate attention-questions and attention-answers to make prediction. The results show that our model achieves state-of-the-art performance on two corpora, SemEval-2015 and SemEval-2017 CQA datasets, outperforming the advanced baselines by a large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Knowledge Distillation: A Survey

Article 22 March 2021

A survey on deep learning approaches for text-to-SQL

Article Open access 23 January 2023

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

Article Open access 17 February 2024

References

Nie L, Wang M, Zhang L, Yan S, Zhang B, Chua TS (2015) Disease inference from health-related questions via sparse deep learning. TKDE 27(8):2107–2119
Google Scholar
Yuan S, Zhang Y, Tang J, Hall W, Cabot JB (2019) Expert finding in community question answering: a review. Artif Intell Rev 2:4470
Google Scholar
Nie L, Wei X, Zhang D, Wang X, Gao Z, Yang Y (2017) Data-driven answer selection in community qa systems. In: IEEE transactions on knowledge and data engineering, pp 1186–1198
Tymoshenko K, Moschitti A (2015) Assessing the impact of syntactic and semantic structures for answer passages reranking. In: 24th ACM international on conference on information and knowledge management, pp 1451–1460
Yih SWT, Chang M-W, Meek C, Pastusiak A (2013) Question answering using enhanced lexical semantic models. In: 51st annual meeting of the association for computational linguistics (ACL), pp 1744–1753
Cao X, Cong G, Cui B, Jensen CS (2010) A generalized framework of exploring category information for question retrieval in community question answer archives. In: 19th international conference on World wide web. ACM, pp 201–210
Cao X, Cong G, Cui B, Jensen CS, Zhang C (2009) The use of categorization information in language models for question retrieval. In: 18th ACM conference on information and knowledge management, pp 265–274
Duan H, Cao Y, Lin C-Y, Yu Y (2008) Searching questions by identifying question topic and question focus. In: ACL-08: HLT, pp 156–164
Jeon J, Croft WB, Lee JH (2005) Finding semantically similar questions based on their answers. In: SIGIR. ACM, pp 617–618
Jeon J, Croft WB, Lee JH (2005) Finding similar questions in large question and answer archives. In: 14th ACM international conference on Information and knowledge management, pp 84–90
Xiang Y, Chen Q, Wang X, Qin Y (2017) Answer selection in community question answering via attentive neural networks. IEEE Signal Process Lett 24(4):505–509
Article Google Scholar
Xiang Y, Zhou X, Chen Q, Zheng Z, Tang B, Wang X, Qin Y (2016) Incorporating label dependency for answer quality tagging in community question answering via cnn-lstm-crf. In: 26th international conference on computational linguistics, pp 1231–1241
Tan M, dos Santos C, Xiang B, Zhou B (2015) Lstm-based deep learning models for non-factoid answer selection. arXiv:1511.04108
Tau YW, Ming-Wei C, Christopher M, Andrzej P (2013) Question answering using enhanced lexical semantic models. In: Annual meeting of the association for computational linguistics
Riloff E, Thelen M (2012) A rule-based question answering system for reading comprehension tests. Science 2:447
Google Scholar
Yih WT, Chang MW, Meek C, Pastusiak A (2013) Question answering using enhanced lexical semantic models. Meeting Assoc Comput Linguist 2:1744–1753
Google Scholar
Wang M, Manning CD (2010) Probabilistic tree-edit models with structured latent variables for textual entailment and question answering. Int Conf Comput Linguist 5:1164–1172
Google Scholar
Wang M, Smith NA, Mitamura T (2007) What is the jeopardy model? a quasisynchronous grammar for qa. In: EMNLP-CoNLL
Smith DA, Eisner J (2006) Quasi-synchronous grammars: alignment by soft projection of syntactic dependencies. In: Workshop on statistical machine translation
Heilman M, Smith NA (2010) Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. In: NAACL
Yao X, Van Durme B, Clark P (2013) Answer extraction as sequence tagging with tree edit distance. In: NAACL-HLT
Feng M, Xiang B, Glass MR, Wang L, Zhou B (2015) Applying deep learning to answer selection: A study and an open task. page 813–820. In: IEEE workshop on automatic speech recognition and understanding (ASRU)
Surdeanu M, Ciaramita M, Zaragoza H (2011) Learning to rank answers to non-factoid questions from web collections. Comput Linguist 37(2):351–383
Article Google Scholar
Robertson S, Zaragoza H et al (2009) The probabilistic relevance framework: Bm25 and beyond. Found Trends Inf Retr 3(4):333–389
Article Google Scholar
Chao L (2016) Research and application on intelligent disease guidance and medical question answering method. MSc. thesis, Dalian University of Technology
Textrank MR (2004) Bringing order into text. 2004 conference on empirical methods in natural language processing. Science 2:11554
Google Scholar
Moschitti A, Quarteroni S (2011) Linguistic kernels for answer re-ranking in question answering systems. Inf Process Manag 47(6):825–842
Article Google Scholar
Yu L, Hermann KM, Blunsom P, Pulman S (2014) Deep learning for answer sentence selection. CoRR arXiv:1412.1632
Kim Y (2014) Convolutional neural networks for sentence classification. In: Conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1746–1751
Severyn A, Moschitti A (2015) Learning to rank short text pairs with convolutional deep neural networks. Int ACM SIGIR Conf 5:373–382
Google Scholar
Feng M et al (2015) Applying deep learning to answer selection: a study and an open task. In: IEEE workshop on automatic speech recognition and understanding, ASRU
Tymoshenko K, Bonadiman D, Moschitti A (2016) Convolutional neural networks vs. convolution kernels: feature engineering for answer sentence reranking. In: 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1268–1278
Wang D, Nyberg E (2015) A long short-term memory model for answer sentence selection in question answering. Int Joint Conf Nat Lang Process Meeting Assoc Comput Linguist 3:707–712
Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
Article MathSciNet Google Scholar
Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5–6):602–610
Article Google Scholar
Tan M, Xiang B, Dos Santos C, Zhou B (2016) Improved representation learning for question answer matching. In: 54th annual meeting of the association for computational linguistics, pp 464–473
Yin W, Yu M, Xiang B, Zhou B, Schtze H (2016) Simple question answering by attentive convolutional neural network. arXiv preprint arXiv:1606.03391
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International conference on learning representations
Zhang X, Li S, Sha L, Wang H (2017) Attentive interactive neural networks for answer selection in community question answering. In: 31st AAAI conference on artificial intelligence, pp 3525–3531
Wang S, Jiang J (2017) A compare-aggregate model for matching text sequences. In: International conference on learning representations
Tan M, Xiang B, Zhou B (2015) Lstm based deep learning models for non-factoid answer selection. arXiv preprint arXiv:1511.04108
Tran NK, Niedereée C (2018) Multihop attention networks for question answer matching. In: International ACM SIGIR conference on research and development in information retrieval
Zhang X, Li S, Sha L, Wang H (2017) Attentive interactive neural networks for answer selection in community question answering. In: 31st AAAI conference on artificial intelligence, pp 3525–3531
Weston J, Chopra S, Bordes A (2014) Memory networks. arXiv preprint arXiv:1410.3916
Deng Y, Xie Y, Li Y, Yang M, Nan D, Fan W, Lei K, Shen Y (2019) Multi-task learning with multi-view attention for answer selection and knowledge base question answering. Proc AAAI Conf Artif Intell 33:6318–6325
Google Scholar
Savenkov D, Agichtein E (2017) Evinets: neural networks for combining evidence signals for factoid question answering. In: Proceedings of the 55th annual meeting of the association for computational linguistics (vol 2: Short Papers), pp 299–304
Deng Y, Xie Y, Li Y, Yang M, Lam W, Shen Y (2021) Contextualized knowledge-aware attentive neural network: enhancing answer selection with knowledge. ACM Trans Inf Syst 40(1):4478
Google Scholar
Bordes A, Usunier N, Garcia-Durán A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Proceedings of the 26th international conference on neural information processing systems-vol 2, NIPS’13, pp 2787–2795
Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: NAACL, pp 4171–4186
Yu AW, Dohan D, Le Q, Luong T et al. (2018) Fast and accurate reading comprehension by combining self-attention and convolution. In: International conference on learning representations
Lin Z, Feng M, dos Santos CN, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. In: International conference on learning representations
Zhu M, Ahuja A, Wei W, Reddy CK (2010) A hierarchical attention retrieval model for healthcare question answering. In: World Wide Web conference, pp 2472–2482
Tran QH, Tran D-V, Vu T, Nguyen ML, Pham SB (2015) Jaist: combining multiple features for answer selection in community question answering. In: 9th international workshop on semantic evaluation (SemEval2015), pp 215–219
Hou Y, Tan C, Wang X, Zhang Y, Xu J, Chen Q (2015) Hitsz-icrc: exploiting classification approach for answer selection in community question answering. In: 9th international workshop on semantic evaluation (SemEval 2015), pp 196–202
Joty S, Barrón-Cedeño A, Da San Martino G, Filice S, Màrquez L, Moschitti A, Nakov P (2015) Global thread-level inference for comment classification in community question answering. In: EMNLP, pp 573–578
Joty S, Màrquez L, Nakov P (2016) Joint learning with global inference for comment classification in community question answering. In: 2016 conference of the North American chapter of the association for computational lin-guistics: human language technologies, pp 703–713
Wu W, Wang H, Li S (2017) Bi-directional gated memory networks for answer selection. In: Chinese computational linguistics and natural language processing based on naturally annotated big data, pp 251–262
Wu G, Sheng Y, Lan M, Wu Y (2017) Ecnu at semeval-2017 task 3: using traditional and deep learning methods to address community question answer-ing task. In 11th international workshop on semantic evaluation (SemEval-2017), pp 365–369
Xiang Y, Zhou X, Chen Q et al. (2016) Incorporating label dependency for answer quality tagging in community question answering via cnn-lstm-crf. In: COLING 2016, 26th international conference on computational linguistics, pp 1231–1241
Wu W, Sun X, Wang H et al. (2018) Question condensing networks for answer selection in community question answering. In: Association for computational linguistics, pp 1746–1755

Download references

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Haitian Yang, Yan Wang, Degang Sun & Weiqing Huang
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Haitian Yang, Yan Wang, Degang Sun & Weiqing Huang
York University, Toronto, Canada
Xuan Zhao
Shanghai University of Finance and Economics, Shanghai, China
Wei Chen

Authors

Haitian Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Degang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Weiqing Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Haitian Yang or Yan Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, H., Zhao, X., Wang, Y. et al. BertHANK: hierarchical attention networks with enhanced knowledge and pre-trained model for answer selection. Knowl Inf Syst 64, 2189–2213 (2022). https://doi.org/10.1007/s10115-022-01703-7

Download citation

Received: 16 April 2021
Revised: 24 May 2022
Accepted: 28 May 2022
Published: 29 June 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s10115-022-01703-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BertHANK: hierarchical attention networks with enhanced knowledge and pre-trained model for answer selection

Abstract

Access this article

Similar content being viewed by others

Knowledge Distillation: A Survey

A survey on deep learning approaches for text-to-SQL

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

BertHANK: hierarchical attention networks with enhanced knowledge and pre-trained model for answer selection

Abstract

Access this article

Similar content being viewed by others

Knowledge Distillation: A Survey

A survey on deep learning approaches for text-to-SQL

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation