Skip to main content
Log in

Improving and evaluating complex question answering over knowledge bases by constructing strongly supervised data

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Complex question answering (CQA) is widely used in real-world tasks such as search engines and intelligent customer service. With the development of large-scale knowledge bases, CQA over knowledge bases has attracted considerable attention in recent years. However, there are many types of complex questions, and few works deeply focus on the performance analysis of models for different types of questions. Another major challenge is the lack of complete supervised labels due to the expense of manual labelling, decreasing model interpretability and increasing the difficulty of model training. In this paper, we constructed a dataset, named CoSuQue, which includes multiple types of complex questions and complete supervised labels that are easily obtained. Our work provides an in-depth analysis of the model’s ability to answer different types of questions, contributing a comprehensive evaluation of the performance of CQA models. Based on the ability of the model to handle different types of questions, the model structure can be improved in a more targeted manner. The different types of complex questions and the complete supervised labels allow the inference process of the model to be investigated. Furthermore, we propose a novel training method that leverages the proposed dataset to improve the performance of the model on other publicly available datasets. Experiments on the Complex WebQuestions and WebQuestionsSP datasets demonstrate the effectiveness of our approach on the CQA task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Availability of data and materials

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Code availability

Some or models, or code that support the findings of this study are available from the corresponding author upon reasonable request.

Notes

  1. https://stanfordnlp.github.io/CoreNLP/.

  2. The Knowledge base can be downloaded from https://developers.google.com/freebase/.

References

  1. Jiang Y, Bansal M (2019) Self-assembling modular networks for interpretable multi-hop reasoning. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 4474–4484

  2. Cao X, Liu Y (2021) Coarse-grained decomposition and fine-grained interaction for multi-hop question answering. J Intell Inform Syst 58:21–41

    Article  Google Scholar 

  3. Jiang Y, Bansal M (2019) Avoiding reasoning shortcuts: adversarial evaluation, training, and model development for multi-hop QA. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 2726–2736

  4. Yang Z, Qi P, Zhang S, Bengio Y, Cohen W, Salakhutdinov R, Manning CD (2018) Hotpotqa: a dataset for diverse, explainable multi-hop question answering. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 2369–2380

  5. Cao X, Liu Y, Hu B, Zhang Y (2021) Dual-channel reasoning model for complex question answering. Complexity 2021:7367181. https://doi.org/10.1155/2021/7367181

  6. Ren H, Dai H, Dai B, Chen X, Yasunaga M, Sun H, Schuurmans D, Leskovec J, Zhou D (2021) Lego: latent execution-guided reasoning for multi-hop question answering on knowledge graphs. In: International conference on machine learning, pp 8959–8970. PMLR

  7. Saxena A, Chakrabarti S, Talukdar P (2021) Question answering over temporal knowledge graphs. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 6663–6676

  8. Kapanipathi P, Abdelaziz I, Ravishankar S, Roukos S, Gray A, Astudillo RF, Chang M, Cornelio C, Dana S, Fokoue-Nkoutche A et al (2021) Leveraging abstract meaning representation for knowledge base question answering. In: Findings of the association for computational linguistics: ACL-IJCNLP 2021, pp 3884–3894

  9. Gu Y, Kase S, Vanni M, Sadler B, Liang P, Yan X, Su Y (2021) Beyond iid: three levels of generalization for question answering on knowledge bases. In: Proceedings of the web conference 2021, pp 3477–3488

  10. Xu K, Lai Y, Feng Y, Wang Z (2019) Enhancing key-value memory neural networks for knowledge based question answering. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 2937–2947

  11. Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) Dbpedia: a nucleus for a web of open data. In: The Semantic Web, pp 722–735. Springer

  12. Suchanek FM, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on world wide web, pp 697–706

  13. Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp. 1247–1250

  14. Li X, Zang H, Yu X, Wu H, Zhang Z, Liu J, Wang M (2021) On improving knowledge graph facilitated simple question answering system. Neural Comput Appl 33(16):10587–10596

    Article  Google Scholar 

  15. Min S, Zhong V, Zettlemoyer L, Hajishirzi H (2019) Multi-hop reading comprehension through question decomposition and rescoring. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 6097–6109

  16. Liang C, Berant J, Le Q, Forbus K, Lao N (2017) Neural symbolic machines: learning semantic parsers on freebase with weak supervision. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 23–33

  17. Qiu Y, Wang Y, Jin X, Zhang K (2020) Stepwise reasoning for multi-relation question answering over knowledge graph with weak supervision. In: Proceedings of the 13th international conference on web search and data mining, pp 474–482

  18. Qiu Y, Zhang K, Wang Y, Jin X, Bai L, Guan S, Cheng X (2020) Hierarchical query graph generation for complex question answering over knowledge graph. In: Proceedings of the 29th ACM international conference on information & knowledge management, pp 1285–1294

  19. He G, Lan Y, Jiang J, Zhao WX, Wen J-R (2021) Improving multi-hop knowledge base question answering by learning intermediate supervision signals. In: Proceedings of the 14th ACM international conference on web search and data mining, pp 553–561

  20. Luo K, Lin F, Luo X, Zhu K (2018) Knowledge base question answering via encoding of complex query graphs. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 2185–2194

  21. Chen Y, Li H, Hua Y, Qi G (2021) Formal query building with query structure prediction for complex question answering over knowledge base. In: Proceedings of the Twenty-Ninth international conference on international joint conferences on artificial intelligence, pp 3751–3758

  22. Zhu S, Cheng X, Su S (2020) Knowledge-based question answering by tree-to-sequence learning. Neurocomputing 372:64–72

    Article  Google Scholar 

  23. Han J, Cheng B, Wang X (2020) Open domain question answering based on text enhanced knowledge graph with hyperedge infusion. In: Findings of the association for computational linguistics: EMNLP 2020, pp 1475–1481

  24. Sun H, Dhingra B, Zaheer M, Mazaitis K, Salakhutdinov R, Cohen W (2018) Open domain question answering using early fusion of knowledge bases and text. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp. 4231–4242

  25. Talmor A, Berant J (2018) The web as a knowledge-base for answering complex questions. In: Proceedings of the 2018 Conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long Papers), pp. 641–651

  26. Yih SW-t, Chang M-W, He X, Gao J (2015) Semantic parsing via staged query graph generation: Question answering with knowledge base. In: Proceedings of the joint conference of the 53rd annual meeting of the ACL and the 7th international joint conference on natural language processing of the AFNLP

  27. Hao T, Li X, He Y, Wang FL, Qu Y (2022) Recent progress in leveraging deep learning methods for question answering. Neural Comput Appl 34:2765–2783. https://doi.org/10.1007/s00521-021-06748-3

  28. Lan Y, Jiang J (2020) Query graph generation for answering multi-hop complex questions from knowledge bases. In: Association for computational linguistics

  29. Sun Y, Zhang L, Cheng G, Qu Y (2020) Sparqa: skeleton-based semantic parsing for complex questions over knowledge bases. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp 8952–8959

  30. Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186

  31. Das R, Zaheer M, Thai D, Godbole A, Perez E, Lee JY, Tan L, Polymenakos L, McCallum A (2021) Case-based reasoning for natural language queries over knowledge bases. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 9594–9611

  32. Zhang Y, Dai H, Kozareva Z, Smola AJ, Song L (2018) Variational reasoning for question answering with knowledge graph. In: Thirty-Second AAAI conference on artificial intelligence

  33. Yan Y, Li R, Wang S, Zhang H, Daoguang Z, Zhang F, Wu W, Xu W (2021) Large-scale relation learning for question answering over knowledge bases with pre-trained language models. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 3653–3660

  34. Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. Adv Neural Inform Process Syst 26

  35. Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI conference on artificial intelligence, vol. 28

  36. Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: Twenty-ninth AAAI conference on artificial intelligence

  37. Ji G, He S, Xu L, Liu K, Zhao J (2015) Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers), pp 687–696

  38. Peng Y, Zhang J (2020) Lineare: simple but powerful knowledge graph embedding for link prediction. In: 2020 IEEE international conference on data mining (ICDM), pp 422–431. IEEE

  39. Chao L, He J, Wang T, Chu W (2021) Pairre: knowledge graph embeddings via paired relation vectors. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 4360–4369

  40. Saxena A, Tripathi A, Talukdar P (2020) Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 4498–4507

  41. Trouillon T, Welbl J, Riedel S, Gaussier É, Bouchard G (2016) Complex embeddings for simple link prediction. In: International conference on machine learning, pp 2071–2080. PMLR

  42. Ren H, Hu W, Leskovec J (2019) Query2box: reasoning over knowledge graphs in vector space using box embeddings. In: International conference on learning representations

  43. Liu L, Du B, Ji H, Zhai C, Tong H (2021) Neural-answering logical queries on knowledge graphs. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp 1087–1097

  44. Zhang Z, Wang J, Chen J, Ji S, Wu F (2021) Cone: cone embeddings for multi-hop reasoning over knowledge graphs. Adv Neural Inform Process Syst 34:19172–19183

    Google Scholar 

  45. Haveliwala TH (2003) Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE Trans Knowl Data Eng 15(4):784–796

    Article  Google Scholar 

  46. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    Article  MathSciNet  MATH  Google Scholar 

  47. Miller AH, Fisch A, Dodge J, Karimi A-H, Bordes A, Weston J (2016) Key-value memory networks for directly reading documents. In: EMNLP

  48. Sun H, Bedrax-Weiss T, Cohen W (2019) Pullnet: Open domain question answering with iterative retrieval on knowledge bases and text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 2380–2390

  49. Xiong W, Yu M, Chang S, Guo X, Wang WY (2019) Improving question answering over incomplete kbs with knowledge-aware reader. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4258–4264

  50. Shen Y, Yang M, Li Y, Wang D, Zheng H, Chen D (2021) Knowledge-based reasoning network for relation detection. IEEE Trans Neural Net Learn Syst. https://doi.org/10.1109/TNNLS.2021.3123751

  51. Zhang Y, Jin L, Zhang Z, Li X, Liu Q, Wang H (2022) Sf-ann: leveraging structural features with an attention neural network for candidate fact ranking. Appl Intell 52(5):5841–5856

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the Fundamental Research Funds for the Central Universities (Grant number 2020YJS012) and National Key R &D Program of China(No.2018YFC0832300; No.2018YFC0832303).

Funding

This research was funded by the Fundamental Research Funds for the Central Universities (Grant Number 2020YJS012) and National Key R &D Program of China (No.2018YFC0832300; No.2018YFC0832303).

Author information

Authors and Affiliations

Authors

Contributions

XC, YZ and BS designed the study; XC performed the experiments, analysed the data, and wrote the manuscript.

Corresponding author

Correspondence to Yingsi Zhao.

Ethics declarations

Conflict of interest

The authors have no competing interest to declare that are relevant to the content of this article.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, X., Zhao, Y. & Shen, B. Improving and evaluating complex question answering over knowledge bases by constructing strongly supervised data. Neural Comput & Applic 35, 5513–5533 (2023). https://doi.org/10.1007/s00521-022-07965-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07965-0

Keywords

Navigation