KnowledgeNavigator: leveraging large language models for enhanced reasoning over knowledge graph

Guo, Tiezheng; Yang, Qingwen; Wang, Chen; Liu, Yanyi; Li, Pan; Tang, Jiawei; Li, Dapeng; Wen, Yingyou

doi:10.1007/s40747-024-01527-8

KnowledgeNavigator: leveraging large language models for enhanced reasoning over knowledge graph

Original Article
Open access
Published: 02 July 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

KnowledgeNavigator: leveraging large language models for enhanced reasoning over knowledge graph

Download PDF

Tiezheng Guo^1,2,
Qingwen Yang^1,2,
Chen Wang^1,2,
Yanyi Liu¹,
Pan Li^1,2,
Jiawei Tang^1,2,
Dapeng Li² &
…
Yingyou Wen ORCID: orcid.org/0000-0002-6659-1785^1,2,3

82 Accesses
Explore all metrics

Abstract

Large language models have achieved outstanding performance on various downstream tasks with their advanced understanding of natural language and zero-shot capability. However, they struggle with knowledge constraints, particularly in tasks requiring complex reasoning or extended logical sequences. These limitations can affect their performance in question answering by leading to inaccuracies and hallucinations. This paper proposes a novel framework called KnowledgeNavigator that leverages large language models on knowledge graphs to achieve accurate and interpretable multi-hop reasoning. Especially with an analysis-retrieval-reasoning process, KnowledgeNavigator searches the optimal path iteratively to retrieve external knowledge and guide the reasoning to reliable answers. KnowledgeNavigator treats knowledge graphs and large language models as flexible components that can be switched between different tasks without additional costs. Experiments on three benchmarks demonstrate that KnowledgeNavigator significantly improves the performance of large language models in question answering and outperforms all large language models-based baselines.

Knowledge-Infused Pre-trained Models for KG Completion

Improving embedded knowledge graph multi-hop question answering by introducing relational chain reasoning

Article 11 November 2022

Question Answering over Knowledge Graphs via Machine Reading Comprehension

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Large language models (LLMs) have gained popularity in the NLP field due to their impressive performance [1,2,3]. LLMs pre-trained with massive data show far superior capabilities of understanding and reasoning on natural language than other language models. Although LLMs have good performance on a wide range of downstream tasks, they still struggle with knowledge-intensive challenges. Studies have shown that LLMs suffer from hallucinations and knowledge limitations, including outdated or incorrect facts and a lack of specialized knowledge [4,5,6]. Furthermore, LLMs face difficulties in performing reasoning with long logical sequences or intricate structures [7]. These shortcomings restrict their use, especially in high-risk and high-sensitivity fields such as medicine.

Improving LLMs with external knowledge is an intuitive approach to overcoming their limitations. This is particularly useful for question-answering (QA) tasks. For these tasks, the process involves retrieving correct and real-time knowledge relevant to the question, constructing prompts, and then feeding these prompts to the LLM for analysis or summary, as demonstrated in Fig. 1. A knowledge graph (KG) is a vital source of such external knowledge, which supports temporal and multimodal knowledge through structured data storage techniques [8, 9]. Knowledge graphs, which store comprehensive real-world information as a graph of triples, offer more robust and stronger semantic logic than plain text and are better suited to support logical reasoning tasks.

To enhance the performance of LLMs with a knowledge graph, retrieving and reasoning over a multi-hop path is essential. However, there are three main challenges: First, each entity in the knowledge graph is connected to a large number of relations, but most are irrelevant to the given question. Without efficient filtering, the lengthy context and excess invalid information can lead to incorrect reasoning in LLMs [10]. Second, questions often require a multi-hop search over the graph, which can cause the search area to grow exponentially. Therefore, effective retrieval and pruning methods are essential. Finally, the knowledge graph’s triple structure can be difficult for general LLMs to process, as they aren’t typically pre-trained or fine-tuned on structured data [11]. Finding an appropriate knowledge representation is crucial for effective prompting.

Considering the above challenges, this paper proposes KnowledgeNavigator, a novel general framework to implement enhanced knowledge graph reasoning. It consists of three stages: Question Analysis, Knowledge Retrieval, and Reasoning. KnowledgeNavigator starts by predicting the retrieval scope required for the question and creates a set of similar queries. Guided by the question, it iteratively retrieves and filters relevant relations and entities at each hop within the knowledge graph. This process ensures that only the necessary knowledge is recalled to answer the question. Subsequently, this knowledge is synthesized and converted into natural language to minimize redundancy and circumvent the processing limitations of LLMs on triples. The refined knowledge is then fed to LLM for advanced reasoning. In this pipline, the knowledge graph serves as an external knowledge source, while the LLM enhances the understanding of question semantics, predicts search direction, and facilitates reasoning. Both components function as plug-ins within KnowledgeNavigator. This design allows KnowledgeNavigator to support any knowledge graph and backbone LLM, capitalizing on the timely updated knowledge and domain-specific information in the knowledge graph without the overhead of frequent retraining of LLM.

The main contributions of this paper can be summarized as follows:

KnowledgeNavigator proposes a novel framework that leverages semantic and structural information to guide LLMs in enhanced multi-hop reasoning on knowledge graphs. KnowledgeNavigator effectively retrieves external knowledge to assist in generating reliable answers for KGQA tasks.
KnowledgeNavigator features a general process design. The iterative retrieval module deploys similar question generation and voting mechanisms to rerank candidate knowledge, enhancing the alignment between target queries and reasoning paths. The knowledge representation module reorganizes and converts triple knowledge into LLM-friendly formats, reducing the complexity of reasoning. Therefore, KnowledgeNavigator can be directly used for various LLMs and KGs without retraining or fine-tuning.
KnowledgeNavigator is evaluated on various KGQA benchmarks to validate its superiority. KnowledgeNavigator outperforms all LLM-based baselines and achieves competitive performance with fully supervised models.

Related work

Knowledge reasoning for KGQA

Essentially, a knowledge graph is a semantic network with various entities, concepts, and relations between them [12]. The KGQA task, as an important application of knowledge graphs in the NLP field, aims to generate answers for a given question by mining and reasoning on the existing knowledge graph [13]. Reasoning over knowledge graphs is crucial for supporting KGQA due to the inherent limitations of knowledge graphs, which can be incomplete and noisy to varying degrees. Early knowledge reasoning mainly relies on logical rules, which require experts to design grammars and rules for specific domains. These methods have strong interpretability but require a lot of manual intervention and cannot be generalized efficiently [14,15,16]. With the development of representation learning, many studies consider both local and global or high-level and low-level knowledge correlations to enhance feature extraction and representation capabilities and support various downstream tasks [17, 18]. In KGQA, many studies also apply embeddings with rich semantic information to map the entities and relations to a low-dimensional vector space and capture their potential semantic relationships to extract the optimal answer. These studies greatly improve the performance of knowledge reasoning for KGQA, but the effectiveness of these methods relies on the representation of embedding models and lacks interpretability [19,20,21]. To better solve complex multi-hop reasoning, more researchers currently apply neural networks to learn the interaction patterns between entities and relations in the knowledge graph to achieve automatic and accurate reasoning and improve the generalization of reasoning models [22, 23]. Modified metaheuristics have also been applied to improve the model’s ability to represent patterns by exploring the optimal hyperparameter combination, thereby improving the performance of KGQA tasks [24, 25].

Knowledge graph enhanced LLM

Knowledge graphs support the structured representation of real-world knowledge, through temporal and personalized design and can meet a variety of different knowledge storage and usage requirements [26]. Therefore, knowledge graphs are applied to enhance LLM pre-training and LLM generation as an important knowledge source [27]. The knowledge graph contains structured information that has clearer logic and reasoning paths compared to natural language. Therefore, many studies utilize entities and relations to build a corpus and design various training tasks, aiming to enhance the effectiveness of LLM pre-training [28,29,30]. However, both retraining and continued pretraining of LLM require high computing resources and time costs, making it challenging to keep up with the rapidly evolving knowledge applications. Therefore, a more straightforward approach to addressing the lack of knowledge in LLM is to construct knowledge-enhanced prompts with factual information. Many works retrieve knowledge related to the target question through external retrieval algorithms and incorporate this knowledge into prompts for LLM to aid in reasoning within unfamiliar domains [31, 32]. However, in scenarios involving long reasoning chains and long-tail knowledge, it is challenging to effectively retrieve helpful knowledge to support LLM. To address these challenges, our work sets up a comprehensive process encompassing question analysis, knowledge retrieval, and reasoning, enabling efficient and accurate knowledge retrieval and effective expression. This approach aims to meet the demands of LLM for conducting complex reasoning tasks that lack internal knowledge, facilitated by knowledge graphs.

Method

KnowledgeNavigator is designed to support KGQA tasks by performing enhanced reasoning on knowledge graphs. The reasoning process of KnowledgeNavigator contains three stages: Question Analysis, Knowledge Retrieval, and Reasoning as shown in Fig. 2.

Question analysis

The multi-hop reasoning of questions is the main challenge in KGQA tasks. The Question Analysis stage enhances and restricts the reasoning through pre-analyzing the given question, which supports enhanced reasoning on the knowledge graph. This approach helps to improve retrieval efficiency and accuracy.

To answer a question Q, KnowledgeNavigator first predicts the potential hop number $h_Q$ of the question to obtain all the knowledge required for it, starting from the core entities. The hop number indicates the maximum reasoning depth required to retrieve the information. The process of hop number prediction is a classification task. KnowledgeNavigator implements it with a fine-tuned pre-trained language model (PLM) and a simple linear classifier:

$$\begin{aligned} V_Q= & {} PLM(Q) \end{aligned}$$

(1)

$$\begin{aligned} h_Q= & {} \arg \max _{h} P(h|V_Q), h \in \{1, 2, \ldots , H\} \end{aligned}$$

(2)

The reasoning logic of each question in the KGQA task is implied in the semantics of the question itself. Therefore, knowledge graph reasoning is a process of mining this reasoning logic from the question. In order to enhance this mining, KnowledgeNavigator generates a set of similar questions $S = {\{s^Q_1, s^Q_2, \ldots , s^Q_m\}}$ with the same semantics as the original question using LLM. Various ways of phrasing the same question can shed light on the reasoning logic from different angles. Therefore, these similar questions serve to enrich the information available during the Knowledge Retrieval stage.

In the case of Fig. 2, KnowledgeNavigator predicts the number of reasoning hops $h_Q$ starting from the core entity "Babaloo Mandel" is 2 with PLM fine-tuned with the MetaQA 2-hop dataset. It then generates S containing two variants of the original question.

Knowledge retrieval

Extracting relevant knowledge from the knowledge graph is crucial for answering a given question. The Knowledge Retrieval stage aims to extract the logical path by performing advanced reasoning on the knowledge graph. This constructs a smaller, more focused subgraph that aids in generating answers. The retrieval process is mainly achieved by interacting with LLM, which helps avoid the expense of retraining the model for various tasks.

Knowledge retrieval is an iterative search process with a depth limit of $h_Q$. In each iteration i, KnowledgeNavigator begins with a set of core entities $E_i = {\{e^1_i, e^2_i, \ldots , e^n_i\}}$. It then explores all one-hop relations connected to each entity, forming a candidate relation set $R^n_i = \{r^{n,1}_i, r^{n,2}_i, \ldots , r^{n,k}_i\}$. Since an entity may have many relations in a knowledge graph, not all are relevant to the question. It is necessary to prune the reasoning path to minimize the influence of unrelated or noisy knowledge on answer generation. KnowledgeNavigator linearizes the candidate relations for each entity into a string and formats it alongside the entity and question variations in S as prompts for LLM. The LLM is tasked with choosing the K most relevant relations from $R^n_i$ based on the question variant.

Based on the results of relation filtering, a weighted voting mechanism is employed to rank the frequency of each relation linked to entity $e^n_i$. Relations chosen for the original question are given double the weight of those from variants generated by the LLM in the first stage:

$$\begin{aligned} \text {Score}(r)= & {} \sum _{s \in S} w(s) \cdot \mathbb {I}(r, LLM(e, s, R)) \end{aligned}$$

(3)

$$\begin{aligned} w(s)= & {} {\left\{ \begin{array}{ll} 2 &{} \text {if } s = Q \\ 1 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

(4)

At this stage, the indicator function $\mathbb {I}$ denotes whether a relation r from set R is chosen by the LLM. Specifically, the function assigns a value of 1 if the relation is selected and 0 if it is not.

The ranking of relations for each entity is carried out independently, enhancing the diversity of the reasoning process. After filtering all relations in iteration i, KnowledgeNavigator selects M optimal relations for each entity. It then queries the triples $(e^n_i, optimal\_r^n_i, tail)$ and $(head, optimal\_r^n_i, e^n_i)$ from the knowledge graph. These triples form part of the reasoning path and are included in the retrieved knowledge set RK. The untraversed entities in tail and head are compiled into the core entity set $E_{i+1}$ for the next iteration.

In this stage, KnowledgeNavigator begins with the core entities $E_0$ extracted from the given question Q. It then iteratively filters relations and incorporates triples to RK until $h_Q$ is reached. These triples in RK will be used as reasoning knowledge for the next stage.

Reasoning

Through several iterations, KnowledgeNavigator summarizes enough knowledge in RK to address the given question. The reasoning stage then leverages this knowledge to generate the answer.

The knowledge retrieved from the knowledge graph is structured as triples in the format of [head, relation, tail]. Each triple is an implicit expression of the reasoning path. To fully answer a question, the entities and relations from multiple triples could be linked to create a reasoning path and further a reasoning sub-graph. By merging nodes and condensing this sub-graph through triple aggregation, the reasoning efficiency of the LLMs could be enhanced. For instance, KnowledgeNavigator aggregates triples T within RK that share the same head or tail entity and relation into a single, consolidated triple:

$$\begin{aligned} f_{head}(T)= & {} \left\{ (h, r, [a_1, \ldots , a_n]) \mid \forall (h, r, a_i) \in T \right\} \end{aligned}$$

(5)

$$\begin{aligned} f_{tail}(T)= & {} \left\{ ([h_1, \ldots , h_n], r, a) \mid \forall (h_i, r, a) \in T \right\} \end{aligned}$$

(6)

This can effectively reduce redundant information details and enhance the ability to represent knowledge. KnowledgeNavigator then converts the aggregated triples into natural language using templates (e.g. The {relation} of {head} is(are) : {tail}). This can circumvent the limited capacity of LLM to understand data structured as triples. Subsequently, the natural language-formatted knowledge is merged into a single string and fed to LLM along with the question Q. The LLM is prompted to generate an answer completely based on the provided external knowledge without using its own learned knowledge.

Case study

Figure 2 is an example of KnowledgeNavigator performing a KGQA task. First of all, KnowledgeNavigator predicts the reasoning hop number starting from the core entity "Babaloo Mandel" based on a PLM and generates two similar questions for the target question with LLM.

Then, KnowledgeNavigator extracts all relations linked to "Babaloo Mandel" and serializes them into "birth_year; birth _place; written_by; created_by", as part of the prompt. For the only core entity "Babaloo Mandel", LLM selects the optimal relation linked to it based on each question variant and gets the weighted voting result {written_by:3, created_by:1}. As the number of optimal relation selections for each entity is set to one, the triples with "written_by" as the optimal relation and "Babaloo Mandel" as the head or tail entity (i.e. [Splash, written_by, Babaloo Mandel] and [Parenthood, written_by, Babaloo Mandel]) are extracted as the first step of the reasoning path. The tail entities "Splash" and "Parenthood" are selected as the core entities for the second iteration to continue the knowledge retrieval.

After KnowledgeNavigator reaches the predicted hops, the triples retrieved from the knowledge graph can be combined into an effective reasoning path (e.g. Babaloo Mandel - written_by - Splash - starred_actors - Dianne Wiest), and further into a reasoning sub-graph. Take the knowledge about "Splash" as an example, KnowledgeNavigator will first combine the triples into "[Splash, starred_actors, [Dary Hannah, Tom Hanks]]", and then convert it into "The actors starred in Splash are: Dary Hannah and Tom Hanks" with a template. All triples retrieved are finally concatenated into a string and fed to LLM as part of the answer generation prompt.

Experiments

Dataset

To test the ability of KnowledgeNavigator on multi-hop knowledge graph reasoning tasks, it is evaluated on two datasets: MetaQA [33] and WebQSP [34]. In the KGQA task on both datasets, Hits@1 is used as the evaluation metric to evaluate the correctness of the answers generated by LLM, following the previous works [35,36,37,38].

MetaQA is a large-scale KGQA dataset in the movie domain that provides a knowledge graph with 43k entities, 9 relations, 135k triples, and 407k questions. The question set is extracted from the Facebook MovieQA dataset, containing questions that require 1-hop to 3-hop reasoning away from the head entities. Each question consists of a head entity, a reasoning path, and the answer entities. To verify KnowledgeNavigator’s multi-hop reasoning capability, the 2-hop and 3-hop vanilla datasets in MetaQA are used for experiments.

WebQSP is a benchmark with fewer questions but a large-scale knowledge graph, which can effectively evaluate the large-scale search ability of KnowledgeNavigator. WebQSP provides questions up to 2 hops based on freebase, each question contains a topic entity, constraints, inferential chains, and SPARQL queries for finding the answer. The base knowledge graph is set up with the latest version of freebase data dumps provided by Google including 3.12B triples [39]. WebQSP provides 4737 questions, and 11 questions that have no gold answers are removed from it.

Table 1 The performance of KnowledgeNavigator and baselines on MetaQA and WebQSP. The best result in each block is in bold

Full size table

Baselines

To evaluate the effectiveness of KnowledgeNavigator in reasoning on knowledge graphs, it is compared with a set of well-known baseline models in the field of KGQA, which are all built with fully supervised. These baselines are divided into two categories based on their retrieval method: (1) Embedding-based methods: KV-Mem [35], EmbedKGQA [40], NSM [41], Transfernet [42]. (2) Retrieval-augmented methods: GraftNet [36], CBR-SUBG [43]. All of these baselines were evaluated on both MetaQA and WebQSP. In addition, UniKGQA [44], KAPING [37], TOG [38] and StructGPT [45] are included as baselines for using LLM for KGQA tasks. These frameworks are all based on un-fine-tuned LLM for knowledge retrieval and question reasoning.

The LLama-2-70B-Chat [46] and ChatGPT are applied as large language model baselines. Specifically, the same template is applied to prompt both of these large language models. The only distinction between the baseline and KnowledgeNavigator is the external knowledge retrieved.

Experiment details

KnowledgeNavigator is decoupled from LLM, any LLM can be used as a plug-in component for reasoning. ChatGPT and LLama-2-70B-Chat are applied as the LLM components in experiments. ChatGPT is called with the OpenAI API. LLama-2-70B-Chat is deployed locally with 4 NVIDIA A100 80 G without quantification, thus avoiding model quality loss. The context length is set to 4096 as the default, and the maximum number of tokens to generate per output sequence is set to 1024. The bert-base-uncased and a linear classifier are fine-tuned on the training set of the datasets for hop prediction.

For both datasets, KnowledgeNavigator generates two variants for each question, and the hop prediction is conducted within the range of 1 to 3. In MetaQA, KnowledgeNavigator performs weighted ranking on the top-1 relation for each (question variant, entity, relations) group and selects the top-1 ranked result for the next iteration. For WebQSP, these two parameters are set to top-2. In the few-shot scenario, a few-shot is composed of two examples from the training set of the same dataset, which is in the same format as the target task.

Main results

Table 1 shows the performance of KnowledgeNavigator and the baselines on KGQA datasets. Through optimized knowledge retrieval and answer reasoning, KnowledgeNavigator achieves impressive accuracy of 99.5% on MetaQA 2-hop with LLama-2-70B-Chat, 95.0% and 83.5% on MetaQA 3-hop and WebQSP tasks with ChatGPT. KnowledgeNavigator outperforms all baseline models on WebQSP and surpasses all other LLM-based methods on all three datasets.

First, LLM can answer questions in KGQA tasks without relying on external knowledge and even outperforms KV-Mem on the WebQSP benchmark. However, there is still a significant performance gap between the LLM and state-of-the-art models for KGQA tasks. This suggests that the LLM faces challenges in reasoning and answering complex questions using only its internal knowledge.

TOG and StructGPT proposed to retrieve question related knowledge from the knowledge graphs to assist LLM in reasoning and achieved better performance in KGQA tasks. However, StructGPT only applies LLM in knowledge retrieval and directly uses the tail entity of the triples as the answer. This approach ignores the underlying reasoning logic between triples and only achieves limited performance. TOG requires LLM to judge whether the knowledge of each hop meets the requirements of the question. This not only leads to the accumulation of errors but also significantly increases the time cost. In contrast, KnowledgeNavigator effectively considers the semantic relationships between questions, entities, relations, and retrieval history through multi-hop sequence retrieval. KnowledgeNavigator outperforms the best results using LLM by 2.2%, 8%, and 7.3% on the three datasets, respectively, and controls the retrieval time cost within an acceptable range.

As for the Embedding-based methods, they achieve multi-hop reasoning mainly based on the similarity of embeddings between questions and entities. The model needs to apply calculations on the entire knowledge graph for each new question, which results in high retrieval complexity and low accuracy. The retrieval-augmented methods reduce reasoning complexity by retrieving relevant subgraphs from knowledge graphs and deploying reasoning on the subgraphs instead. However, these studies are based on fully supervised models, and it is difficult to extend to other applications without retraining. On the other hand, as a general framework, KnowledgeNavigator can be combined with any knowledge graph and LLM without the need for retraining or fine-tuning. This allows it to utilize the latest knowledge in real-time with better versatility and generalization while achieving comparable performance to fully supervised models.

Ablation study

An ablation study is performed on KnowledgeNavigator, aiming to analyze the impact of similar questions and knowledge representation forms. The ablation study involves experiments with a varying number of similar questions and forms of knowledge representation. In the ablation study, LLama-2-70B-Chat and ChatGPT are deployed as the backbone LLMs and use the same prompt template with 2-shot examples for all cases. Figure 3 and 4 shows the results of the ablation study.

Impact of number of similar questions

Figure 3 shows the accuracy of using 0, 2, and 4 similar questions for voting in relation selection. The performance trends of LLama-2-70B-Chat and ChatGPT as backbones are similar when used in scenarios with varying numbers of similar questions. For MetaQA with relatively simple tasks, relying on LLMs alone is already sufficient for correctly understanding the original question and selecting the next-hop relation, even without similar questions, especially in MetaQA 2-hop tasks. The presence of more voters ensures a more stable relation selection. Therefore, in both backbones, the performance of KnowledgeNavigator improves as the number of similar questions participating in the voting increases. In WebQSP, the limitations imposed by the low quality of the original questions restrict the KnowledgeNavigator’s ability to discover the correct retrieval paths. Therefore, utilizing similar questions can bring greater performance improvements. With two similar questions, KnowledgeNavigator achieved accuracy improvements of 4.4% and 4.5% on LLama-2-70B-Chat and ChatGPT, respectively. However, the semantic ambiguity of the original questions also leads to the instability of voting. When using LLama-2-70B-Chat, the additional similar questions result in an increase in voting errors.

Moreover, as the number of KnowledgeNavigator’s requests to the LLM rises linearly with the number of similar questions, there’s a balance to be struck between computational costs and it’s effectiveness. In our experiments, the number of similar questions is set to 2 as the default to control the computational cost.

Impact of knowledge formats

Figure 4 shows the impact of different knowledge representation forms on the performance of KnowledgeNavigator. In this part, the LLMs are prompted with different representation forms of the same knowledge. Specifically, for "w/ Individual Triples" and "w/ Linked Triples", all triples are concatenated into a string in the form of [head, relation, tail] or $[head, relation, [tail_1, \ldots , tail_n]]$, for "w/ Individual Sentences", each triple is converted into a separate natural language sentence using a template and concatenated into a string.

For different knowledge formats, it can be found that the performance of KnowledgeNavigator increases with the logical closeness of the knowledge representation. First, for both triples and sentences, using aggregated knowledge can effectively reduce the redundant information and improve the density of the knowledge in prompts, therefore reducing the difficulty of reasoning. Second, for the general LLMs that haven’t been fine-tuned, using knowledge in natural language format can avoid the errors caused by their insufficient understanding capabilities on structured data.

In various backbones, ChatGPT demonstrates a stronger ability to comprehend structured information. This enables effective reasoning even with low knowledge density triples. Moreover, ChatGPT achieves superior performance compared to LLama-2-70B-Chat in MetaQA 3-hop and WebQSP when utilizing the same knowledge format, thanks to its advanced reasoning ability.

Error analysis

To analyze the causes of errors in KnowledgeNavigator, 100 error samples are randomly extracted from the results of MetaQA and WebQSP. The errors are manually analyzed and classified into 4 categories according to the reasons:

1.
Relation selection error: Wrong relations are selected in the Knowledge Retrieval stage, resulting in the failure to retrieve the correct knowledge.
2.
Reasoning error: KnowledgeNavigator retrieves the correct knowledge, but performs wrong reasoning in answer generation.
3.
Hallucination: KnowledgeNavigator does not generate answers based on the retrieved external knowledge.
4.
Other errors: Including intermediate errors causing search interruption, and excessively long context leading to knowledge truncation, etc.

Figure 5 shows the error analysis results on the three datasets. It is easy to find that the error distributions on the three datasets are different. The reasoning error is the main error type on MetaQA 2-hop, accounting for 79%, while the relation selection error is the main error type on MetaQA 3-hop and WebQSP, accounting for 95% and 69% respectively. This is because the semantics of the questions in MetaQA 3-hop and WebQSP are more complex: For MetaQA, the 3-hop question features a longer reasoning path and a more intricate knowledge sub-graph. The inherent limitations in the reasoning capabilities of LLMs restrict the accuracy of reasoning and relation selection. For WebQSP, each entity is associated with numerous similar and imprecisely articulated relations in freebase, which complicates the task for LLM to understand and select the most relevant relations for the next iteration. Meanwhile, as the reasoning logic of questions in MetaQA 2-hop is more straightforward, LLM rarely selects the wrong relations or makes wrong reasoning results, and therefore, hallucinations and other errors do not appear in the samples.

According to the error statistics, the performance of LLM on KGQA tasks can be further improved through targeted optimization. Specifically, its ability to select the relevant relations can be enhanced by enhancing the semantics of the questions or strengthening the connection between the knowledge graph and the reasoning path. LLM can also be optimized to perform reasoning by optimizing the prompt and knowledge representation.

Complexity analysis

To illustrate the ability and generalization of KnowledgeNavigator in handling different QA tasks, the complexity and cost of retrieval are summarized into the following categories:

1.
Graph density: The increased density of the knowledge graph will lead to a greater number and complexity of relations between nodes. This can weaken the significance of semantic differences between relations, and thereby bring challenges to relation selection in multi-hop reasoning. This is also a common difficulty in other KGQA studies. However, in KnowledgeNavigator, the voting mechanism based on similar questions and the beam-like search can greatly alleviate this problem.
2.
Complexity of questions: The complexity of questions is proportional to the number of hops required for KnowledgeNavigator to retrieve knowledge in the graph. This is an inevitable process of multi-hop reasoning.
3.
LLM response time: KnowledgeNavigator calls LLM multiple times to generate similar questions and select the optimal relation during knowledge retrieval. Therefore, the total retrieval time is related to the response time of LLM. However, the generalization of KnowledgeNavigator allows it to be directly applied to different knowledge graphs and different types of questions without the need for data preprocessing or model retraining. This makes KnowledgeNavigator more time-efficient in practical applications.

Conclusion

This paper studies the challenge of knowledge limitations in LLM and introduces KnowledgeNavigator to improve the reasoning and question answering capabilities of LLM on knowledge graphs. KnowledgeNavigator consists of three stages: question analysis, knowledge retrieval and reasoning. During question analysis, KnowledgeNavigator first pre-analyzes the question and generates variants for it to assist reasoning. Then, relying on the guidance of LLM, it iteratively retrieves and filters candidate entities and relations within the knowledge graph to extract relevant external knowledge. Finally, this knowledge is transformed into an effective prompt to improve LLM’s performance on knowledge-intensive tasks. KnowledgeNavigator is evaluated with KGQA metrics, and the results indicate that introducing external knowledge from the knowledge graph benefits LLM in handling complex tasks. KnowledgeNavigator outperforms other frameworks deploying LLM on enhanced KGQA and achieves comparable performance with previous fully supervised models. An ablation study is also conducted to confirm the effectiveness of each component in KnowledgeNavigator and analyze the errors. KnowledgeNavigator is a general framework, but its performance depends on the natural language understanding and reasoning capabilities of LLM, as well as its response latency. Therefore, reducing the complexity of reasoning to adapt to more scenarios may be a future research direction.

Data Availability

Data related to the current study are publicly available.

References

Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M-A, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F et al (2023) Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971. Accessed 09 June 2023
Anil R, Dai AM, Firat O, Johnson M, Lepikhin D, Passos A, Shakeri S, Taropa E, Bailey P, Chen Z et al (2023) Palm 2 technical report. arXiv preprint arXiv:2305.10403. Accessed 03 July 2023
Bai J, Bai S, Chu Y, Cui Z, Dang K, Deng X, Fan Y, Ge W, Han Y, Huang F et al (2023) Qwen technical report. arXiv preprint arXiv:2309.16609. Accessed 07 Dec 2023
Zhang Y, Li Y, Cui L, Cai D, Liu L, Fu T, Huang X, Zhao E, Zhang Y, Chen Y et al (2023) Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219. Accessed 08 Aug 2023
Martino A, Iannelli M, Truong C (2023) Knowledge injection to counter large language model (llm) hallucination. European Semantic Web Conference. Springer, New York, pp 182–185
Google Scholar
Chen W, Yan-yi L, Tie-zheng G, Da-peng L, Tao H, Zhi L, Qing-wen Y, Hui-han W, Ying-you W (2024) Systems engineering issues for industry applications of large language model. Appl Soft Comput 151:111165
Article Google Scholar
Creswell A, Shanahan M, Higgins I (2022) Selection-inference: Exploiting large language models for interpretable logical reasoning. arXiv preprint arXiv:2205.09712. Accessed 10 June 2023
Zhu X, Li Z, Wang X, Jiang X, Sun P, Wang X, Xiao Y, Yuan NJ (2024) Multi-modal knowledge graph construction and application: a survey. IEEE Trans Knowl Data Eng 36(02):715–735
Google Scholar
Cai B, Xiang Y, Gao L, Zhang H, Li Y, Li J (2022) Temporal knowledge graph completion: A survey. arXiv preprint arXiv:2201.08236. Accessed 11 June 2023
Dong G, Zhao J, Hui T, Guo D, Wang W, Feng B, Qiu Y, Gongque Z, He K, Wang Z et al (2023) Revisit input perturbation problems for llms: a unified robustness evaluation framework for noisy slot filling task. CCF International Conference on Natural Language Processing and Chinese Computing. Springer, New York, pp 682–694
Google Scholar
Moiseev F, Dong Z, Alfonseca E, Jaggi M (2022) Skill: structured knowledge infusion for large language models. arXiv preprint arXiv:2205.08184. Accessed 10 Oct 2023
Tian L, Zhou X, Wu Y-P, Zhou W-T, Zhang J-H, Zhang T-S (2022) Knowledge graph and knowledge reasoning: A systematic review. Journal of Electronic Science and Technology 20(2):100159
Article Google Scholar
Chakraborty N, Lukovnikov D, Maheshwari G, Trivedi P, Lehmann J, Fischer A (2021) Introduction to neural network-based question answering over knowledge graphs. Wiley Interdiscip Rev Data Min Knowl Discov 11(3):1389
Article Google Scholar
Berant J, Chou A, Frostig R, Liang P (2013) Semantic parsing on freebase from question-answer pairs. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. pp. 1533–1544
Yih W-t, He X, Meek C (2014) Semantic parsing for single-relation question answering. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). pp. 643–648
Krötzsch M, Marx M, Ozaki A, Thost V (2018) Attributed description logics: Reasoning on knowledge graphs. In: IJCAI, pp. 5309–5313
Xiao Z, Xing H, Qu R, Feng L, Luo S, Dai P, Zhao B, Dai Y (2024) Densely knowledge-aware network for multivariate time series classification. IEEE Trans Syst, Man, Cybern Syst 54(4):2192–2204
Article Google Scholar
Xiao Z, Xing H, Zhao B, Qu R, Luo S, Dai P, Li K, Zhu Z (2023) Deep contrastive representation learning with self-distillation. IEEE Trans Emerg Topics Comput Intell 8(1):3–15
Article Google Scholar
Huang X, Zhang J, Li D, Li P (2019) Knowledge graph embedding based question answering. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. pp. 105–113
Yang B, Yih W-t, He X, Gao J, Deng L (2014) Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575. Accessed 22 July 2023
Liu W, Zhao Q, Piao S, Wang C, Kong Q, An T (2017) Endo-sirna deficiency results in oocyte maturation failure and apoptosis in porcine oocytes. Reprod Fertil Dev 29(11):2168–2174
Article Google Scholar
Xiong W, Yu M, Chang S, Guo X, Wang WY (2019) Improving question answering over incomplete kbs with knowledge-aware reader. arXiv preprint arXiv:1905.07098. Accessed 07 Jan 2023
Das R, Dhuliawala S, Zaheer M, Vilnis L, Durugkar I, Krishnamurthy A, Smola A, McCallum A (2017) Go for a walk and arrive at the answer: Reasoning over paths in knowledge bases using reinforcement learning. arXiv preprint arXiv:1711.05851. Accessed 09 June 2023
Cincovic J, Jovanovic L, Nikolic B, Bacanin N (2024) Neurodegenerative condition detection using modified metaheuristic for attention based recurrent neural networks and extreme gradient boosting tuning. IEEE Access 12:26719–26734
Article Google Scholar
Pavlov-Kagadejev M, Jovanovic L, Bacanin N, Deveci M, Zivkovic M, Tuba M, Strumberger I, Pedrycz W (2024) Optimizing long-short-term memory models via metaheuristics for decomposition aided wind energy generation forecasting. Artif Intell Rev 57(3):45
Article Google Scholar
Ji S, Pan S, Cambria E, Marttinen P, Philip SY (2021) A survey on knowledge graphs: representation, acquisition, and applications. IEEE Trans Neural Netw Learn Syst 33(2):494–514
Article MathSciNet Google Scholar
Pan S, Luo L, Wang Y, Chen C, Wang J, Wu X (2023) Unifying large language models and knowledge graphs: a roadmap. arXiv preprint arXiv:2306.08302. Accessed 14 July 2023
Zhang Z, Han X, Liu Z, Jiang X, Sun M, Liu Q (2019) Ernie: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129
Feng S, Balachandran V, Bai Y, Tsvetkov Y (2023) Factkb: Generalizable factuality evaluation using language models enhanced with factual knowledge. arXiv preprint arXiv:2305.08281
Yu D, Zhu C, Yang Y, Zeng M (2022) Jaket: joint pre-training of knowledge graph and language understanding. Proc AAAI Conf Artif Intell. 36:11630–11638
Google Scholar
Baek J, Aji AF, Saffari A (2023) Knowledge-augmented language model prompting for zero-shot knowledge graph question answering. arXiv preprint arXiv:2306.04136. Accessed 10 Aug 2023
Ji Z, Liu Z, Lee N, Yu T, Wilie B, Zeng M, Fung P (2022) Rho ($\rho $): Reducing hallucination in open-domain dialogues with knowledge grounding. arXiv preprint arXiv:2212.01588. Accessed 26 July 2023
Zhang Y, Dai H, Kozareva Z, Smola A, Song L (2018) Variational reasoning for question answering with knowledge graph. Proc AAAI Conf Artif Intell. 32: 6069–6076
Yih W-t, Richardson M, Meek C, Chang M-W, Suh J (2016) The value of semantic parse labeling for knowledge base question answering. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). pp. 201–206
Xu K, Lai Y, Feng Y, Wang Z (2019) Enhancing key-value memory neural networks for knowledge based question answering. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp 2937–2947
Sun H, Dhingra B, Zaheer M, Mazaitis K, Salakhutdinov R, Cohen WW (2018) Open domain question answering using early fusion of knowledge bases and text. arXiv preprint arXiv:1809.00782. Accessed 09 Sept 2023
Baek J, Aji AF, Saffari A (2023) Knowledge-augmented language model prompting for zero-shot knowledge graph question answering. arXiv preprint arXiv:2306.04136. Accessed 10 Aug 2023
Sun J, Xu C, Tang L, Wang S, Lin C, Gong Y, Shum H-Y, Guo J (2023) Think-on-graph: Deep and responsible reasoning of large language model with knowledge graph. arXiv preprint arXiv:2307.07697. Accessed 11 Oct 2023
Google: Freebase Data Dumps. https://developers.google.com/freebase/data (2023). Accessed 15 Aug 2023
Saxena A, Tripathi A, Talukdar P (2020) Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pp. 4498–4507
He G, Lan Y, Jiang J, Zhao WX, Wen J-R (2021) Improving multi-hop knowledge base question answering by learning intermediate supervision signals. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining. pp. 553–561
Shi J, Cao S, Hou L, Li J, Zhang H (2021) Transfernet: An effective and transparent framework for multi-hop question answering over relation graph. arXiv preprint arXiv:2104.07302
Das R, Godbole A, Naik A, Tower E, Zaheer M, Hajishirzi H, Jia R, McCallum A (2022) Knowledge base question answering by case-based reasoning over subgraphs. In: International Conference on Machine Learning, pp. 4777–4793. PMLR. Accessed 21 Oct 2023
Jiang J, Zhou K, Zhao WX, Wen J-R (2022) Unikgqa: Unified retrieval and reasoning for solving multi-hop question answering over knowledge graph. arXiv preprint arXiv:2212.00959
Jiang J, Zhou K, Dong Z, Ye K, Zhao WX, Wen J-R (2023) Structgpt: A general framework for large language model to reason over structured data. arXiv preprint arXiv:2305.09645
Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, Bashlykov N, Batra S, Bhargava P, Bhosale S, et al (2023) Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288

Download references

Funding

Funding was provided by Shenyang Science and Technology Plan Project (Grant no. RC210469) and Liaoning Provincial Science and Technology Innovation Project in the Field of Artificial Intelligence (Project name:Research on key technologies for systems engineering of large language model) (Grant no. 2023JH26/10100005).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Northeastern University, Shenyang, 110167, China
Tiezheng Guo, Qingwen Yang, Chen Wang, Yanyi Liu, Pan Li, Jiawei Tang & Yingyou Wen
Neusoft AI Magic Technology Research, Shenyang, 110179, China
Tiezheng Guo, Qingwen Yang, Chen Wang, Pan Li, Jiawei Tang, Dapeng Li & Yingyou Wen
Neusoft Institute of Intelligent Medical Research, Shenyang, 110179, China
Yingyou Wen

Authors

Tiezheng Guo
View author publications
You can also search for this author in PubMed Google Scholar
Qingwen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yanyi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Pan Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Tang
View author publications
You can also search for this author in PubMed Google Scholar
Dapeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yingyou Wen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yingyou Wen.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A. Prompt templates

A.1 Similar questions generation

Based on the following question, rewrite {Question Number} questions that convey the same meaning. You’re only required to rephrase the questions without introducing any additional information.

In-context few-shot

your task:

Question: {Question}

Answer:

A.2 Relation retrieval

Please review the following information and choose {Relation number} relations associated with the core entity that are most pertinent to the given question from the provided options. Your response should only include the name of the chosen relation.

In-context few-shot

Target question: {Question}

Core entity: {Core entity}

Relations connected to the entity: {Relations}

Answer:

A.3 Reasoning

You are a Q &A robot tasked with answering questions by summarizing the provided knowledge. Do not utilize any information not provided below. Given the following question and the retrieved knowledge, create a final answer to the question.

In-context few-shot

Target question: {Question}

Knowledge: {Retrieved knowledge}

Answer:

Appendix B Practical examples

See Tables 2 and 3.

Table 2 Examples of KnowledgeNavigator’s multi-hop reasoning process on the MetaQA dataset

Full size table

Table 3 Examples of KnowledgeNavigator’s multi-hop reasoning process on the MetaQA dataset

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Guo, T., Yang, Q., Wang, C. et al. KnowledgeNavigator: leveraging large language models for enhanced reasoning over knowledge graph. Complex Intell. Syst. (2024). https://doi.org/10.1007/s40747-024-01527-8

Download citation

Received: 01 March 2024
Accepted: 12 June 2024
Published: 02 July 2024
DOI: https://doi.org/10.1007/s40747-024-01527-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

KnowledgeNavigator: leveraging large language models for enhanced reasoning over knowledge graph

Abstract

Similar content being viewed by others

Knowledge-Infused Pre-trained Models for KG Completion

Improving embedded knowledge graph multi-hop question answering by introducing relational chain reasoning

Question Answering over Knowledge Graphs via Machine Reading Comprehension

Introduction

Related work

Knowledge reasoning for KGQA

Knowledge graph enhanced LLM

Method

Question analysis

Knowledge retrieval

Reasoning

Case study

Experiments

Dataset

Baselines

Experiment details

Main results

Ablation study

Impact of number of similar questions

Impact of knowledge formats

Error analysis

Complexity analysis

Conclusion

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A. Prompt templates

A.1 Similar questions generation

A.2 Relation retrieval

A.3 Reasoning

Appendix B Practical examples

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation