Improving and evaluating complex question answering over knowledge bases by constructing strongly supervised data

Cao, Xing; Zhao, Yingsi; Shen, Bo

doi:10.1007/s00521-022-07965-0

Improving and evaluating complex question answering over knowledge bases by constructing strongly supervised data

Original Article
Published: 05 November 2022

Volume 35, pages 5513–5533, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Xing Cao^1,3,
Yingsi Zhao ORCID: orcid.org/0000-0002-0821-1852²^na1 &
Bo Shen^1,3^na1

336 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Complex question answering (CQA) is widely used in real-world tasks such as search engines and intelligent customer service. With the development of large-scale knowledge bases, CQA over knowledge bases has attracted considerable attention in recent years. However, there are many types of complex questions, and few works deeply focus on the performance analysis of models for different types of questions. Another major challenge is the lack of complete supervised labels due to the expense of manual labelling, decreasing model interpretability and increasing the difficulty of model training. In this paper, we constructed a dataset, named CoSuQue, which includes multiple types of complex questions and complete supervised labels that are easily obtained. Our work provides an in-depth analysis of the model’s ability to answer different types of questions, contributing a comprehensive evaluation of the performance of CQA models. Based on the ability of the model to handle different types of questions, the model structure can be improved in a more targeted manner. The different types of complex questions and the complete supervised labels allow the inference process of the model to be investigated. Furthermore, we propose a novel training method that leverages the proposed dataset to improve the performance of the model on other publicly available datasets. Experiments on the Complex WebQuestions and WebQuestionsSP datasets demonstrate the effectiveness of our approach on the CQA task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The state of the art in open domain complex question answering: a survey

Article 06 June 2022

LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs

LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia

Availability of data and materials

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Code availability

Some or models, or code that support the findings of this study are available from the corresponding author upon reasonable request.

Notes

https://stanfordnlp.github.io/CoreNLP/.
The Knowledge base can be downloaded from https://developers.google.com/freebase/.

References

Jiang Y, Bansal M (2019) Self-assembling modular networks for interpretable multi-hop reasoning. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 4474–4484
Cao X, Liu Y (2021) Coarse-grained decomposition and fine-grained interaction for multi-hop question answering. J Intell Inform Syst 58:21–41
Article Google Scholar
Jiang Y, Bansal M (2019) Avoiding reasoning shortcuts: adversarial evaluation, training, and model development for multi-hop QA. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 2726–2736
Yang Z, Qi P, Zhang S, Bengio Y, Cohen W, Salakhutdinov R, Manning CD (2018) Hotpotqa: a dataset for diverse, explainable multi-hop question answering. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 2369–2380
Cao X, Liu Y, Hu B, Zhang Y (2021) Dual-channel reasoning model for complex question answering. Complexity 2021:7367181. https://doi.org/10.1155/2021/7367181
Ren H, Dai H, Dai B, Chen X, Yasunaga M, Sun H, Schuurmans D, Leskovec J, Zhou D (2021) Lego: latent execution-guided reasoning for multi-hop question answering on knowledge graphs. In: International conference on machine learning, pp 8959–8970. PMLR
Saxena A, Chakrabarti S, Talukdar P (2021) Question answering over temporal knowledge graphs. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 6663–6676
Kapanipathi P, Abdelaziz I, Ravishankar S, Roukos S, Gray A, Astudillo RF, Chang M, Cornelio C, Dana S, Fokoue-Nkoutche A et al (2021) Leveraging abstract meaning representation for knowledge base question answering. In: Findings of the association for computational linguistics: ACL-IJCNLP 2021, pp 3884–3894
Gu Y, Kase S, Vanni M, Sadler B, Liang P, Yan X, Su Y (2021) Beyond iid: three levels of generalization for question answering on knowledge bases. In: Proceedings of the web conference 2021, pp 3477–3488
Xu K, Lai Y, Feng Y, Wang Z (2019) Enhancing key-value memory neural networks for knowledge based question answering. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 2937–2947
Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) Dbpedia: a nucleus for a web of open data. In: The Semantic Web, pp 722–735. Springer
Suchanek FM, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on world wide web, pp 697–706
Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp. 1247–1250
Li X, Zang H, Yu X, Wu H, Zhang Z, Liu J, Wang M (2021) On improving knowledge graph facilitated simple question answering system. Neural Comput Appl 33(16):10587–10596
Article Google Scholar
Min S, Zhong V, Zettlemoyer L, Hajishirzi H (2019) Multi-hop reading comprehension through question decomposition and rescoring. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 6097–6109
Liang C, Berant J, Le Q, Forbus K, Lao N (2017) Neural symbolic machines: learning semantic parsers on freebase with weak supervision. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 23–33
Qiu Y, Wang Y, Jin X, Zhang K (2020) Stepwise reasoning for multi-relation question answering over knowledge graph with weak supervision. In: Proceedings of the 13th international conference on web search and data mining, pp 474–482
Qiu Y, Zhang K, Wang Y, Jin X, Bai L, Guan S, Cheng X (2020) Hierarchical query graph generation for complex question answering over knowledge graph. In: Proceedings of the 29th ACM international conference on information & knowledge management, pp 1285–1294
He G, Lan Y, Jiang J, Zhao WX, Wen J-R (2021) Improving multi-hop knowledge base question answering by learning intermediate supervision signals. In: Proceedings of the 14th ACM international conference on web search and data mining, pp 553–561
Luo K, Lin F, Luo X, Zhu K (2018) Knowledge base question answering via encoding of complex query graphs. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 2185–2194
Chen Y, Li H, Hua Y, Qi G (2021) Formal query building with query structure prediction for complex question answering over knowledge base. In: Proceedings of the Twenty-Ninth international conference on international joint conferences on artificial intelligence, pp 3751–3758
Zhu S, Cheng X, Su S (2020) Knowledge-based question answering by tree-to-sequence learning. Neurocomputing 372:64–72
Article Google Scholar
Han J, Cheng B, Wang X (2020) Open domain question answering based on text enhanced knowledge graph with hyperedge infusion. In: Findings of the association for computational linguistics: EMNLP 2020, pp 1475–1481
Sun H, Dhingra B, Zaheer M, Mazaitis K, Salakhutdinov R, Cohen W (2018) Open domain question answering using early fusion of knowledge bases and text. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp. 4231–4242
Talmor A, Berant J (2018) The web as a knowledge-base for answering complex questions. In: Proceedings of the 2018 Conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long Papers), pp. 641–651
Yih SW-t, Chang M-W, He X, Gao J (2015) Semantic parsing via staged query graph generation: Question answering with knowledge base. In: Proceedings of the joint conference of the 53rd annual meeting of the ACL and the 7th international joint conference on natural language processing of the AFNLP
Hao T, Li X, He Y, Wang FL, Qu Y (2022) Recent progress in leveraging deep learning methods for question answering. Neural Comput Appl 34:2765–2783. https://doi.org/10.1007/s00521-021-06748-3
Lan Y, Jiang J (2020) Query graph generation for answering multi-hop complex questions from knowledge bases. In: Association for computational linguistics
Sun Y, Zhang L, Cheng G, Qu Y (2020) Sparqa: skeleton-based semantic parsing for complex questions over knowledge bases. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp 8952–8959
Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186
Das R, Zaheer M, Thai D, Godbole A, Perez E, Lee JY, Tan L, Polymenakos L, McCallum A (2021) Case-based reasoning for natural language queries over knowledge bases. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 9594–9611
Zhang Y, Dai H, Kozareva Z, Smola AJ, Song L (2018) Variational reasoning for question answering with knowledge graph. In: Thirty-Second AAAI conference on artificial intelligence
Yan Y, Li R, Wang S, Zhang H, Daoguang Z, Zhang F, Wu W, Xu W (2021) Large-scale relation learning for question answering over knowledge bases with pre-trained language models. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 3653–3660
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. Adv Neural Inform Process Syst 26
Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI conference on artificial intelligence, vol. 28
Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: Twenty-ninth AAAI conference on artificial intelligence
Ji G, He S, Xu L, Liu K, Zhao J (2015) Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers), pp 687–696
Peng Y, Zhang J (2020) Lineare: simple but powerful knowledge graph embedding for link prediction. In: 2020 IEEE international conference on data mining (ICDM), pp 422–431. IEEE
Chao L, He J, Wang T, Chu W (2021) Pairre: knowledge graph embeddings via paired relation vectors. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 4360–4369
Saxena A, Tripathi A, Talukdar P (2020) Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 4498–4507
Trouillon T, Welbl J, Riedel S, Gaussier É, Bouchard G (2016) Complex embeddings for simple link prediction. In: International conference on machine learning, pp 2071–2080. PMLR
Ren H, Hu W, Leskovec J (2019) Query2box: reasoning over knowledge graphs in vector space using box embeddings. In: International conference on learning representations
Liu L, Du B, Ji H, Zhai C, Tong H (2021) Neural-answering logical queries on knowledge graphs. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp 1087–1097
Zhang Z, Wang J, Chen J, Ji S, Wu F (2021) Cone: cone embeddings for multi-hop reasoning over knowledge graphs. Adv Neural Inform Process Syst 34:19172–19183
Google Scholar
Haveliwala TH (2003) Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE Trans Knowl Data Eng 15(4):784–796
Article Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article MathSciNet MATH Google Scholar
Miller AH, Fisch A, Dodge J, Karimi A-H, Bordes A, Weston J (2016) Key-value memory networks for directly reading documents. In: EMNLP
Sun H, Bedrax-Weiss T, Cohen W (2019) Pullnet: Open domain question answering with iterative retrieval on knowledge bases and text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 2380–2390
Xiong W, Yu M, Chang S, Guo X, Wang WY (2019) Improving question answering over incomplete kbs with knowledge-aware reader. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4258–4264
Shen Y, Yang M, Li Y, Wang D, Zheng H, Chen D (2021) Knowledge-based reasoning network for relation detection. IEEE Trans Neural Net Learn Syst. https://doi.org/10.1109/TNNLS.2021.3123751
Zhang Y, Jin L, Zhang Z, Li X, Liu Q, Wang H (2022) Sf-ann: leveraging structural features with an attention neural network for candidate fact ranking. Appl Intell 52(5):5841–5856
Article Google Scholar

Download references

Acknowledgements

This research was supported by the Fundamental Research Funds for the Central Universities (Grant number 2020YJS012) and National Key R &D Program of China(No.2018YFC0832300; No.2018YFC0832303).

Funding

This research was funded by the Fundamental Research Funds for the Central Universities (Grant Number 2020YJS012) and National Key R &D Program of China (No.2018YFC0832300; No.2018YFC0832303).

Author information

Yingsi Zhao and Bo Shen have contributed equally to this work.

Authors and Affiliations

School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing, 100044, China
Xing Cao & Bo Shen
School of Economics and Management, Beijing Jiaotong University, Beijing, 100044, China
Yingsi Zhao
Key Laboratory of Communication and Information Systems, Beijing Municipal Commission of Education, Beijing, 100044, China
Xing Cao & Bo Shen

Authors

Xing Cao
View author publications
You can also search for this author in PubMed Google Scholar
Yingsi Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Bo Shen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XC, YZ and BS designed the study; XC performed the experiments, analysed the data, and wrote the manuscript.

Corresponding author

Correspondence to Yingsi Zhao.

Ethics declarations

Conflict of interest

The authors have no competing interest to declare that are relevant to the content of this article.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cao, X., Zhao, Y. & Shen, B. Improving and evaluating complex question answering over knowledge bases by constructing strongly supervised data. Neural Comput & Applic 35, 5513–5533 (2023). https://doi.org/10.1007/s00521-022-07965-0

Download citation

Received: 02 March 2022
Accepted: 17 October 2022
Published: 05 November 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s00521-022-07965-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving and evaluating complex question answering over knowledge bases by constructing strongly supervised data

Abstract

Access this article

Similar content being viewed by others

The state of the art in open domain complex question answering: a survey

LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs

LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia

Availability of data and materials

Code availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving and evaluating complex question answering over knowledge bases by constructing strongly supervised data

Abstract

Access this article

Similar content being viewed by others

The state of the art in open domain complex question answering: a survey

LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs

LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia

Availability of data and materials

Code availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation