LLM-Based SPARQL Generation with Selected Schema from Large Scale Knowledge Base

Yang, Shuangtao; Teng, Mao; Dong, Xiaozheng; Bo, Fu

doi:10.1007/978-981-99-7224-1_24

Shuangtao Yang¹¹,
Mao Teng¹¹,
Xiaozheng Dong¹¹ &
…
Fu Bo¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1923))

Included in the following conference series:

China Conference on Knowledge Graph and Semantic Computing

911 Accesses

Abstract

Knowledge base question answering (KBQA) aims to answer natural language questions using structured knowledge bases. Common approaches include semantic parsing-based approaches and retrieval-based approaches. However, both approaches have some limitations. Retrieval-based methods struggle with complex reasoning requirements. Semantic parsing approaches have a complex reasoning process and cannot tolerate errors in earlier steps when generating the final logical form. In this paper, we proposed a large language model (LLM)-based SPARQL generation model, which accepts multiple candidate entities and relations as inputs, reducing the reliance on mention extraction and entity linking performance, and we found an entity combination strategy based on mentions, which can produce multiple SPARQL queries for a single question to boost the chances of finding the correct answer. Finally, our model achieves state-of-the-art performance in the CCKS2023 CKBQA competition, F1 score is 75.63%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/THUDM/ChatGLM-6B.

References

Zhang, J., Chen, B., Zhang, L., et al.: Neural, symbolic and neural-symbolic reasoning on knowledge graphs. AI Open 2, 14–35 (2021). https://doi.org/10.1016/j.aiopen.2021.03.001
Ye, X., Yavuz, S., Hashimoto, K., et al.: RnG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering. arXiv e-prints https://doi.org/10.48550/arXiv.2109.08678 (2021)
Zhang, J., Zhang, X., Yu, J., et al.: Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering (2022). https://doi.org/10.48550/arXiv.2202.13296
He, G., Lan, Y., Jiang, J., et al.: Improving multi-hop knowledge base question answering by learning intermediate supervision signals. Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 553–561 (2021)
Google Scholar
Chen, Y., Wu, L., Zaki, M.J.: Bidirectional Attentive Memory Networks for Question Answering Over Knowledge Bases. arXiv preprint arXiv:1903.02188 (2019)
Saxena, A., Tripathi, A., Talukdar, P.: Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4498–4507 (2020)
Google Scholar
Xu, K., Lai, Y., Feng, Y., et al.: Enhancing key-value memory neural networks for knowledge based question answering. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 2937–2947 (2019)
Google Scholar
Sun, H., Dhingra, B., Zaheer, M., et al.: Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text. arXiv preprint arXiv:1809.00782 (2018)
Sun, H., Bedrax-Weiss, T., Cohen, W.W.: Pullnet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text. arXiv preprint arXiv:1904.09537 (2019)
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 1–45 (2009)
Article Google Scholar
Liang, P.: Lambda dependency-based compositional semantics. Computer Science (2013). https://doi.org/10.48550/arXiv.1309.4408.]
Cao, S., et al.: KQApro: a dataset with explicit compositional programs for complex question answering over knowledge base. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 6101–6119 (2022)
Google Scholar
Purkayastha, S., Dana, S., Garg, D., Khandelwal, D., Bhargav, G.S.: A deep neural approach to KGQA via SPARQL Silhouette generation. In: 2022 International Joint Conference on Neural Networks. IJCNN, IEEE, pp. 1–8 (2022)
Google Scholar
Nie, L., et al: GraphQ IR: Unifying the Semantic Parsing of Graph Query Languages with One Intermediate Representation. ArXiv, arXiv:2205.12078 (2022)
Cao, R., Chen, L., Chen, Z., Zhao, Y., Zhu, S., Yu, K.: LGESQL: Line graph enhanced text-to-SQL model with mixed local and non-local relations. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1–6, 2021, Association for Computational Linguistics, pp. 2541–2555 (2021)
Google Scholar
Das, R., Zaheer, M., Thai, D., et al.: Case-Based Reasoning for Natural Language Queries Over Knowledge Bases. arXiv preprint arXiv:2104.08762 (2021)
Kapanipathi, P., Abdelaziz, I., Ravishankar, S., et al.: Leveraging Abstract Meaning Representation for Knowledge Base Question Answering. arXiv preprint arXiv:2012.01707 (2020)
Lan, Y., Jiang, J.: Query Graph Generation for Answering Multi-Hop Complex Questions from Knowledge Bases. Association for Computational Linguistics (2020)
Google Scholar
Sun, Y., Zhang, L., Cheng, G., et al.: SPARQA: skeleton-based semantic parsing for complex questions over knowledge bases. Proceedings of the AAAI Conference on Artificial Intelligence 34(05), 8952–8959 (2020)
Google Scholar
Qiu, Y., Wang, Y., Jin, X., et al.: Stepwise reasoning for multi-relation question answering over knowledge graph with weak supervision. Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 474–482 (2020)
Google Scholar
Das, R., Zaheer, M., Thai, D., et al.: Case-based reasoning for natural language queries over knowledge bases. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 9594-9611 (2021)
Google Scholar
Huang, X., Kim, J.J., Zou, B.: Unseen entity handling in complex question answering over knowledge base via language generation. Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 547–557 (2021)
Google Scholar
Xiong, G., Bao, J., Zhao, W., et al.: AutoQGS: auto-prompt for low-resource knowledge-based question generation from SPARQL. Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pp. 2250–2259 (2022)
Google Scholar
Floridi, L., Chiriatti, M.: GPT-3: Its Nature, Scope, Limits, and Consequences. [2023–08–17]. https://doi.org/10.1007/s11023-020-09548-1
Chowdhery, A., Narang, S., Devlin, J., et al.: PaLM: Scaling Language Modeling with Pathways (2022). https://doi.org/10.48550/arXiv.2204.02311
Touvron, H., Lavril, T., Izacard, G., et al.: Llama: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971 (2023)
Min, B., Ross, H., Sulem, E., et al.: Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey. arXiv preprint arXiv:2111.01243 (2021)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. Advances in Neural Information Processing Syst. 30 (2017)
Google Scholar
Kenton, J.D.M.W.C., Toutanova, L.K.: Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NaacL-HLT, 1, p. 2 (2019)
Google Scholar
Team O A I. ChatGPT: Optimizing Language Models for Dialogue (2022)
Google Scholar
Zhao, W.X., Zhou, K., Li, J., et al.: A Survey of Large Language Models. arXiv preprint arXiv:2303.18223 (2023)
Lin, Y., Ji, H., Huang, F., et al.: A joint neural model for information extraction with global features. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7999–8009 (2020)
Google Scholar
Wu, L., Petroni, F., Josifoski, M., et al.: Scalable Zero-Shot Entity Linking with Dense Entity Retrieval. arXiv preprint arXiv:1911.03814 (2019)
Li, B.Z., Min, S., Iyer, S., et al.: Efficient One-Pass End-to-End Entity Linking for Questions. arXiv preprint arXiv:2010.02413 (2020)
Du, Z., Qian, Y., Liu, X., et al.: Glm: General Language Model Pretraining with Autoregressive Blank Infilling. arXiv preprint arXiv:2103.10360 (2021)
Hu, E.J., Shen, Y., Wallis, P., et al.: Lora: Low-Rank Adaptation of Large Language Models. arXiv preprint arXiv:2106.09685 (2021)
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Lenovo Knowdee (Beijing) Intelligent Technology, Haidian District, Beijing, China
Shuangtao Yang, Mao Teng, Xiaozheng Dong & Fu Bo

Authors

Shuangtao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Mao Teng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaozheng Dong
View author publications
You can also search for this author in PubMed Google Scholar
Fu Bo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuangtao Yang .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
Haofen Wang
Chinese Academy of Sciences, Beijing, China
Xianpei Han
Harbin Institute of Technology, Harbin, China
Ming Liu
Nanjing University, Nanjing, China
Gong Cheng
University of South China, Hengyang, China
Yongbin Liu
Zhejiang University, Hangzhou, China
Ningyu Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, S., Teng, M., Dong, X., Bo, F. (2023). LLM-Based SPARQL Generation with Selected Schema from Large Scale Knowledge Base. In: Wang, H., Han, X., Liu, M., Cheng, G., Liu, Y., Zhang, N. (eds) Knowledge Graph and Semantic Computing: Knowledge Graph Empowers Artificial General Intelligence. CCKS 2023. Communications in Computer and Information Science, vol 1923. Springer, Singapore. https://doi.org/10.1007/978-981-99-7224-1_24

Download citation

DOI: https://doi.org/10.1007/978-981-99-7224-1_24
Published: 28 October 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7223-4
Online ISBN: 978-981-99-7224-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics