GNNQ: A Neuro-Symbolic Approach to Query Answering over Incomplete Knowledge Graphs

. Real-world knowledge graphs (KGs) are usually incomplete— that is, miss some facts representing valid information. So, when applied to such KGs, standard symbolic query engines fail to produce answers that are expected but not logically entailed by the KGs. To overcome this issue, state-of-the-art ML-based approaches ﬁrst embed KGs and queries into a low-dimensional vector space, and then produce query answers based on the proximity of the candidate entity and the query embeddings in the embedding space. This allows embedding-based approaches to obtain expected answers that are not logically entailed. However, embedding-based approaches are not applicable in the inductive setting, where KG entities (i.e., constants) seen at runtime may diﬀer from those seen during training. In this paper, we propose a novel neuro-symbolic approach to query answering over incomplete KGs applicable in the inductive setting. Our approach ﬁrst symbolically augments the input KG with facts representing parts of the KG that match query fragments, and then applies a generalisation of the Relational Graph Convolutional Networks (RGCNs) to the augmented KG to produce the predicted query answers. We formally prove that, under reasonable assumptions, our approach can capture an approach based on vanilla RGCNs (and no KG augmentation) using a (often substantially) smaller number of layers. Finally, we empirically validate our theoretical ﬁndings by evaluating an implementation of our approach against the RGCN baseline on several dedicated benchmarks.


Introduction
Knowledge graphs (KGs) are databases where information is represented as a collection of entities and relations between them [13], or, equivalently, as a set of (function-free) first-order facts. Query answering is a fundamental reasoning task on KGs, which requires identifying all (tuples of) entities in a KG that satisfy a specific formal expression, called a query. For example, (conjunctive) query q(x) = ∃y 1 , y 2 . almaMater(x, y 1 ) ∧ professorAt(y 1 , y 2 ) finds, in a KG, all the universities that are the alma maters of persons working as professors.
Queries can be answered over KGs using symbolic logic-based engines, such as SPARQL and Cypher [16]. This approach, however, is challenged by the problem that many real-life KGs are incomplete, in the sense that there are true facts missing in the KG that may be relevant for answering a particular query. For example, if a KG contains the fact professorAt(edith, berkeley), representing that Edith is a professor at UC Berkeley, but it is missing the fact almaMater(melbourne, edith), representing that the University of Melbourne is the alma mater of Edith, then melbourne will not be returned as an answer for the above query, even though this answer may be expected by the user.
Query Embedding (QE) approaches have been proposed as a way to overcome this limitation [4,9,11,17,18,20]. QE approaches embed KGs and monadic conjunctive queries jointly in a low dimensional vector space, and then they evaluate the likelihood of candidate answers according to their distance to the query embedding in the embedding space. These methods can produce answers that may be of interest to the user, even if they correspond to parts of the KG that only partially match the query. However, to the best of our knowledge, existing QE approaches are only applicable in the transductive setting, where trained models can only process KGs that mention only entities seen during training. An increasing number of applications, however, require an inductive setting [10,14,23,25], where unseen entities are also allowed.
Relational Graph Convolutional Networks (RGCNs) [19] are a class of graph neural networks (GNNs) which take as input directed labelled multigraphsin particular, graphs with nodes connected by coloured edges and annotated with real-valued feature vectors. When applied to such a multigraph, an RGCN updates, in each layer, the feature vector of each node by combining, by means of learned parameters, the node's feature vector in the previous layer with the previous-layer vectors of the node's neighbours. If the vector in the final layer is a single Boolean value, then the RGCN can be seen as a (binary) node classifier. RGCNs can be used to answer monadic queries on a KG: first, encode the KG as a directed multigraph with a node for each entity in the KG; then, run a trained RGCN on the multigraph to predict whether each entity is an answer to the query or not (similar approaches have been used for the related problem of KG completion [10,14,[22][23][24]). This method has three properties making it suitable for answering queries on incomplete KGs in an inductive setting.
1. Inductive Capabilities. RGCNs do not use entity-specific parameters, so they can be applied to KGs mentioning entities not seen during training. 2. Expressivity. Recent theoretical analysis of RGCNs [5] shows that, for every monadic tree-shaped conjunctive query, there exists an RGCN that exactly captures this query-that is, for each KG, the answers provided by the RGCN on the KG are the same as the real query answers over the KG.
3. Noise Tolerance. Similarly to other ML approaches, RGCNs can produce relevant query answers even if such answers do not have exact matches in the input KG (e.g., due to missing information).
A key limitation of using RGCNs for query answering over KGs, however, is that, in order to recognise a part of the KG relevant to a query answer, any RGCN requires at least as many layers as the length of the longest (simple) path in the query to an answer variable. Empirical results have shown, however, that GNNs with many layers often fail to learn long-range dependencies and suffer from several problems, such as over-smoothing [12]. This problem persists even if the input KGs have no missing information.
To address this limitation, we propose in this paper a novel neuro-symbolic approach to inductive query answering over incomplete KGs. Our approach first augments an input KG using a set of logical (i.e., symbolic) rules extracted from the query. The application of a rule to a KG adds new facts that represent (complete) parts of the KG matching connected query fragments. Then the approach encodes the augmented KG as a coloured hypergraph, and processes this hypergraph using a novel neural architecture called Hyper-Relational Graph Convolutional Network (HRGCN ), which generalises vanilla RGCNs to be applicable to coloured hypergraphs. We then provide a proof that, under mild and reasonable assumptions, our approach can emulate the baseline approach that relies on vanilla RGCNs (without KG augmentation) using significantly less layers. Finally, we present an implementation of our approach in a system called GNNQ and evaluate it on nine novel benchmarks for inductive query answering over incomplete KGs against a baseline without augmentation. Our results show that instances of GNNQ can be effectively trained and deployed in practice; moreover, they outperform the baselines, even if the latter use more layers.

Preliminaries
In this paper, we rely on a standard formalisation of knowledge graphs (and related concepts) in first-order logic.
Let us consider disjoint countable sets of predicates, constants, and variables, where each predicate is assigned a natural number called arity. A k-ary atom, with k ∈ N, is an expression of the form P (t), where P is a k-ary predicate and t = t 1 , . . . , t k is a k-tuple of constants and variables. A fact is a variable-free atom. A dataset is a finite set of facts. A knowledge graph (KG) is a dataset containing only unary and binary facts. So, entities in a KG are represented by constants, while classes of entities and relations between them are represented by unary and binary facts, respectively. Let Const(D) and Pred(D) denote the constants and predicates mentioned in a dataset D, respectively.
A conjunctive query (CQ) with (a tuple of) answer variablesx, is a formula q(x) = ∃ȳ. φ(x,ȳ), where the body φ(x,ȳ) is a conjunction of atoms over variables x,ȳ. A tupleā of constants is an answer to q(x) over a dataset D if there is a homomorphism from q(ā) to D-that is, an assignment of constants toȳ such that each atom in φ(ā, h(ȳ)) is in D. Let q[D] denote the set of all answers to q(x) over D. In this paper, we concentrate on tree-shaped CQs (tree-CQs)-that is, constant-free CQs over unary and binary predicates with one answer variable such that the primal pseudograph of the CQ's body is a tree; here, the primal pseudograph of a conjunction of atoms is the undirected pseudograph whose nodes are the variables of the conjunction and which has an edge between (not necessary distinct) z 1 and z 2 for each binary atom R(z 1 , z 2 ) in the conjunction. We call a primal pseudograph primal tree if it is a tree. The height of a tree-CQ is the height of its primal tree with the answer variable as the root.

Inductive Query Answering over Incomplete KGs
We are interested in the problem of finding the answers to a given (known in advance) tree-CQ over KGs that may be incomplete-that is, missing (relevant) information. In particular, we assume that each KG has a completion-that is, a larger (or identical) KG that may include additional facts, which are 'missing' in the original KG. We consider the setting where all the constants in the completion facts are already mentioned in the original KG. However, we assume that the function that maps a KG to its completion is unknown; instead, only partial knowledge about this function is provided to a system in the form of examples, each of which consists of a KG, a constant, and a Boolean value, which tells whether the constant is an answer to the tree-CQ over the completion of the KG. Finally, our setting is inductive [10,14,23,25], which means that there exists a finite, known-in-advance set of predicates used in all KGs, their completions, and the tree-CQ, but the constants in different KGs may be different.
We are now ready to formalise the ML task of inductive tree-CQ answering over incomplete KGs, which we call the IQA task for brevity. Definition 1. Given a finite set Pred of unary and binary predicates, and a tree-CQ q(x) that uses only predicates from Pred, let us assume a hidden completion function · * mapping each KG K with Pred(K) ⊆ Pred to another KG K * with Pred(K * ) ⊆ Pred, called the completion of K, such that K ⊆ K * and Const(K * ) = Const(K). Then, the IQA task is to learn a function g q mapping each KG K with Pred(K) ⊆ Pred to the set q[K * ] of answers to q(x) over K * . Example 1. Let q(x) be the tree-CQ which asks for all universities that are the alma maters of professors who were supervised by Nobel Prize winners, and let K be the KG {supervisedBy(alice, roger), supervisedBy(daniel, carol), wonNobel(roger, physics), wonNobel(carol, medicine), professorAt(alice, oxf ord), professorAt(daniel, oxf ord), Fig. 1). The desired function g q for q(x) should return the set {shanghai, toronto} of answers when applied to K, because both toronto and shanghai are answers to q(x) over K * . Note, however, that shanghai is not an answer to q(x) over K, since the fact almaMater(shanghai, alice) is missing from K.

Neuro-Symbolic Approach to the IQA Task
In this section, we describe our approach for solving the IQA task. For the remainder of this section, let us fix a (possibly empty) set Pred 1 = {A 1 , . . . , A m } of unary predicates, a finite set Pred 2 of binary predicates, and a tree-CQ q(x) = ∃ȳ. φ(x,ȳ) over predicates in Pred 1 ∪ Pred 2 . For technical reasons, we assume that the variables x,ȳ are ordered following a breadth-first traverse of the primal tree of φ(x,ȳ). This assumption is without loss of generality, since given an arbitrary tree-CQ, we can always construct a semantically equivalent query that satisfies our requirement by reorderingȳ. Finally, for each R ∈ Pred 2 , we consider a fresh binary predicateR, which we call the inverse of R, and we let Pred + 2 denote the set Pred 2 ∪ {R | R ∈ Pred 2 }. Our approach is divided in three steps. In the first step, described in Sect. 4.1, the input KG is augmented with new facts that will assist our ML model in recognising parts of the input KG that match selected query fragments. In the second step, described in Sect. 4.2, our approach encodes the augmented KG into a data structure suitable for our ML model, namely, a coloured labelled (multi-)hypergraph, where nodes correspond to constants in the KG and edges to non-unary atoms. In the third and final step, described in Sect. 4.3, the approach processes the coloured hypergraph by means of a generalisation of RGCNs. The output of this process is a Boolean value for each node in the hypergraph, representing whether the constant associated to this node is predicted as an answer to q(x) over the completion of the input KG or not.

Augmentation of Knowledge Graphs
As discussed in the introduction, vanilla RGCNs with Boolean outputs can be used to solve the IQA task by first encoding the input KG as a directed multigraph and then applying a trained RGCN to the encoding. Such RGCNs, however, may require a large number of layers to adequately capture the target function. However, training RGCNs with many layers is expensive; moreover, the resulting models may have poor performance due to over-smoothing [12]. To address these issues, our procedure first augments the input KG with facts representing (complete) parts of the KG matching query fragments. As we prove in Sect. 5, this allows us to solve the IQA task using significantly less layers.
The KG augmentation relies on a set of logical rules, which correspond to fragments of the tree-CQ. These rules are applied to the KG to infer new facts, which are added to the KG. To formalise this step, we need some terminology.
A (projection-free) rule is an expression of the form H(z) ← ψ(z), where the head H(z) is an atom over a |z|-ary predicate H, and the body ψ(z) is a conjunction of atoms using variablesz (i.e., each variable inz appears in at least one atom in ψ(z), and there are no other variables in these atoms). The application of a set R of rules to a dataset D is a dataset Note that in what follows we will only apply a rule to datasets that do not mention the head predicate of the rule.
Next, we associate a set R q of rules to our fixed tree-CQ q(x). Specifically, we define R q as the set of all the rules H(z) ← ψ(z), where ψ(z) is a sub-conjunction of φ(x,ȳ) with the same order of variables inz as their order in x,ȳ, such that -the primal pseudograph of ψ(z) is connected (and hence it is a tree) and -the height of this tree is at least 2, and where H is a fresh |z|-ary predicate uniquely associated to ψ(z). Subsequently, we use Pred q to denote the set of head predicates of the rules in R q . Note that, by our assumptions on the order of variables, the first variable in z will always be the one closest to x in the primal tree of φ(x,ȳ) rooted at x. Moreover, the assumptions ensure that R q does not contain rules with the same body and head predicate, but different heads; this eliminates redundancy by preventing augmentation with multiple facts identifying the same sub-KGs.
As discussed in Sect. 6.1, in our experiments we observe that it is often better not to use all rules in R q in the augmentation step. We believe that there are two main reasons for this: first, increasing the number of augmentation facts appears to have diminishing returns, since different facts can represent similar parts of the input KG (satisfying similar query fragments); second, having a large number of augmentation facts mentioning the same constant can produce problems similar to over-smoothing. Therefore, we consider KG augmentations with full R q and augmentations with subsets of R q .

Definition 2.
The partial augmentation of a KG K over Pred 1 ∪ Pred 2 for the tree-CQ q(x) with respect to rules R q ⊆ R q is the dataset R q (K). The (full) augmentation of K is the partial augmentation with respect to all R q .
of the body of q(x) (see Fig. 2); its primal pseudograph is a tree of height 3. So, R q contains rule r = H(y 1 , y 2 , y 3 , y 4 ) ← ψ(y 1 , y 2 , y 3 , y 4 ) for a fresh predicate H, and the partial augmentation of K for q(x) with respect to R q = {r} is K ∪ {H(alice, oxf ord, roger, physics), H(daniel, oxf ord, carol, medicine)}.

Encoding of Knowledge Graphs
We now describe our encoding of datasets into directed (multi-)hypergraphs where hyperedges are coloured and nodes are labelled by real-valued vectors. Specifically, our encoding introduces a hypergraph node for each constant in the input dataset; then, each fact of arity greater than 1 is encoded into a hyperedge of the colour corresponding to the fact's predicate, and each fact of arity 1 is encoded as a component of the feature vector labelling the corresponding node. Furthermore, for each binary fact in the original dataset with a predicate R, the encoding introduces, besides the R-coloured edge, anR-coloured edge in the reverse direction; such edges will ensure that our ML model propagates information in both directions whenever a binary fact connects two constants.  a dimension δ ∈ N, a (Col, δ)-hypergraph G is a triple (V, E, λ) where V is a finite set of nodes, E is a set of directed hyperedges of the form (v, c, (u 1 , . . . , u k )) with c ∈ Col of arity k + 1, {v, u 1 , . . . , u k } ⊆ V, and λ is a labelling function that assigns a vector λ( Given a (Col, δ)-hypergraph G = (V, E, λ), we denote, for brevity, the vector λ(v) for a node v with v, and we refer to its i th element as (v) i . Furthermore, for each v ∈ V and c ∈ Col, we define the c-neighbourhood N c G (v) of v in G as the set {(u 1 , . . . , u k ) | (v, c, (u 1 , . . . , u k )) ∈ E}. Now that we have our target graph structure, we can define our encoding function, which maps datasets (including augmented KGs) to hypergraphs.

Definition 4. The encoding of a dataset D over predicates Pred
, the binary predicates, their inverses, and the head predicates of R q ), R, (b 1 , . . . , b k )) for every R(a, b 1 . . . , b R, (a)) for every R(a, b) ∈ D with R ∈ Pred 2 , and λ is the labelling that assigns, to each a ∈ V, the vector a ∈ R δ such that Note that the (m + 1) th element of each vector a is always 1; this element is needed to cover the case m = 0-that is, when there are no unary predicates.

Hyper-Relational Graph Convolutional Networks
We now introduce a generalised version of the RGCN [19] architecture that can process (Col, δ)-hypergraphs; we call this generalisation Hyper-Relational Graph Convolutional Network (HRGCN ). Our approach uses a HRGCN to process the hypergraphs that are encodings of augmented KGs.
The result (G) of applying to G is the (Col, δ)-hypergraph (V, E, λ bool ), where λ bool is the labelling of every node v ∈ V by Cls(v L ). Subsequently, (G, v) denotes v L and true [G] denotes the set of all v ∈ V with λ bool (v) = 1.

Advantages of Knowledge Graph Augmentation
As discussed in the introduction, the main motivation for KG augmentation is to help ML models easily recognise parts of the input KG that match complete connected fragments of the query. In this section, we present a theorem that makes this conjecture precise. To this end, we assume a natural and broad class of completion functions, which arguably captures those one may expect to find in practice. Then, we will show that, for each (big enough) tree-CQ q, if there is an instance of our approach capturing the goal function g q without using KG augmentation, then there exists an instance of the approach that also captures g q using (full) KG augmentation, but whose HRGCN has significantly less layers.

Definition 6. A completion function · * over a set of predicates Pred is
-monotonic under homomorphisms if for every KGs K 1 and K 2 over Pred and each homomorphism h from K 1 to K 2 , h is also a homomorphism from K * 1 to K * 2 , where a homomorphism h from K 1 to K 2 is a mapping from Const(K 1 ) to constants such that h(K 1 ) ⊆ K 2 ; -s-local, for s ∈ N, if for every KG K over Pred and every fact α ∈ K * there is K α ⊆ K such that α ∈ K * α and K α contains an undirected path (

through constants and binary facts) from each constant in K α to each constant in α of length at most s; -k-incomplete for a tree-CQ q(x) if for each KG K over Pred and each answer
The intuition under these notions is as follows. Monotonicity under homomorphisms requires that every fact in the completion of a KG should also appear (in a suitable form) in the completion of any KG that has the same structure as the original KG. Locality reflects the intuition that every fact in the completion is a consequence of a small neighbourhood of the fact in the original KG. Finally, incompleteness for a query means that, for every answer to the query, only a small number of facts can be missing in any 'witness' of it-that is, any part of the KG completion (fully) matching the query. We will now state our main result; its proof can be found in the supplemental material.

Let q(x) be a tree-CQ of height h over Pred and · * be a completion function over Pred that is monotonic under homomorphisms, s-local, and k-incomplete for q(x). If there is an L-layer (Pred
We emphasise that many completion functions that one may find in practice will have small values of s and k, thus making k(s + 1) + 1 significantly smaller than L. Therefore (for large enough L) KG augmentation allows us to reduce the number of layers that an HRGCN instance in our approach requires to capture the goal function g q -that is, to capture query q on incomplete KGs.

Implementation and Evaluation
We have implemented our approach to the IQA task over incomplete KGs using Python 3.8.10, RDFLIB 6.1.1, and PyTorch 1.11.0 in a system called GNNQ. We then evaluated several instances GNNQ L of GNNQ using KG augmentation, parametrised by the number L of layers of the underlying HRGCN, on a number of benchmarks. To the best of our knowledge, no existing system can solve the IQA task (in particular, can deal simultaneously with KG incompleteness, complex queries, and the inductive setting); thus, we compared the instances GNNQ L against instances GNNQ − L of GNNQ that do not use KG augmentation, which we treat as baselines. Our experiments show that the GNNQ L instances significantly outperform the GNNQ − L instances, even if the RGCNs underlying the latter use more layers. Thus, we conclude that KG augmentation can provide a significant advantage in solving the IQA task in practice. All experiments were performed on a machine equipped with an Intel R Core TM i9-10900K CPU, 64GB of RAM, running Ubuntu 20.04.4, and a Nvidia GeForce RTX 3090 GPU.

Benchmarks
The existing benchmarks for query answering on KGs used in the QE literature [4,9,11,17,18,20] are designed for the transductive setting, so we cannot use them for an informative comparison of systems addressing the IQA task. Thus, in order to evaluate GNNQ instances, we have designed nine novel IQA benchmarks. Six of these, called WatDiv-Qi, for i ∈ {1, . . . , 6}, are based on synthetic KGs generated with the WatDiv framework [3], and the remaining three, called FB15k237-Qi, for i ∈ {1, 2, 3}, are based on subgraphs of FB15k-237 [6], a real-life KG commonly used in benchmarks for evaluation of KG completion and QE systems. Each of our benchmarks provides the following: -a set Pred of unary and binary predicates and a tree-CQ over Pred; -sets of examples for training (including validation) and testing; each example is of the form (K, a, Ans) where K is a KG over Pred, a is a constant, and Ans ∈ {0, 1} is the ground-truth answer.
The benchmarks are constructed so that the ground-truth answer of an example is 1 if and only if a ∈ q[K * ], where · * is a hidden completion function over Pred, which is not given as part of the benchmark. For all our benchmarks, · * is defined by appropriately constructed Datalog rules [2]; such an approach allows us to capture structural dependencies of KGs, which are best-fitted for the inductive setting [23]. Table 1 summarises the statistics of our nine benchmarks. Further details about the selection of queries, completion functions, and examples for each benchmark are provided in the supplemental material.

GNNQ Implementation
Using a set of predicates Pred and a tree-CQ q(x) as parameters, each GNNQ L processes a KG K over Pred and a candidate constant a ∈ Const(K) by performing the following steps, implementing (and specifying) our approach.
Step 1. Each GNNQ L computes a partial augmentation R q (K) of K with respect to some subset R q ⊆ R q specified as follows: for the FB15k237-Q i benchmarks, we take R q = R q ; in contrast, for the WatDiv-Q i benchmarks, we take R q as the subset of all rules in R q with at most 4 variables. We selected such R q because, on the one hand, the FB15k237-Q i benchmarks are relatively irregular, so we expect that even with full augmentation only a relatively small number of augmentation facts will be generated; on the other hand, the WatDiv-Q i benchmarks are highly regular, which suggests that performance may be hampered if we perform full augmentation, as this will derive many similar facts, which may cause problems analogous to over-smoothing. Each GNNQ L then encodes R q (K) as a (Col, δ)-hypergraph G R q (K) with appropriate Col and δ (see Sect. 4.2).
Step 2. Each GNNQ L applies, to G R q (K) , a (Col, δ)-HRGCN with L layers, dimensions (δ 0 , . . . , δ L ) such that δ 0 = δ and δ L = 1, and the following components. Functions Aggr and Comb for each layer ∈ {1, . . . , L} of are defined so that the feature vector of each node v is updated as where σ is a element-wise leaky ReLU for each ∈ {1, . . . , L − 1} and the element-wise sigmoid function if = L; where every C and A c , for each colour c ∈ Col, are (learnable) real-valued matrices of dimension δ × δ −1 and δ l × (k c δ (l−1) ), respectively, for k c + 1 the arity of c, and each b is a (learnable) real-valued bias vector of dimension δ ; and where [u 1 , . . . , u kc ] is the vector obtained by concatenating u 1 , . . . , u kc . The classification function maps x ∈ R to 1 if and only if x ≥ 0.5. The feature vector dimensions δ 1 = · · · = δ L−1 and the negative slope of the ReLU activations are tuneable hyperparameters.
Step 3. The model returns 1 if a ∈ true [G R q (K) ] and 0 otherwise. The baselines GNNQ − L follow the same procedure, except that they skip KG augmentation and use K instead of R q (K), thus relying on vanilla RGCNs [19].
For each benchmark, we trained and evaluated the GNNQ L instances for each L ∈ {h − 1, h} and the GNNQ − L instances for each L ∈ {h − 1, h, h + 1}, where h is the height of the benchmark's tree-CQ. Before training, we randomly split the benchmark's training-and-validation set of examples into training and validation sets with ratio 1:1 or 2:1, in case of a WatDiv or a FB15k237 benchmark, respectively. In each training run (on the training set), we trained all model parameters for 250 epochs using the Adam optimiser and a standard binary cross-entropy loss computed using the value of the (1-dimensional) feature vector in the last layer of the model as the prediction value (i.e., without applying the classification function). Each training run is specified by hyperparameters: the learning rate from {.0001, .0006, . . . , .1001}, the negative slope of the leaky-ReLU activation functions from {.001, .006, . . . , .101}, and the latent feature vector dimension from {8, 9, . . . , 64}. We report results for the hyperparameter values maximising the average precision on the validation set, which are found by means of 100 training runs using Optuna (MedianPruner) with 5 warm-up runs, 30 warm-up epochs in every run, and step size 25.

Performance Metrics
For each benchmark, we evaluated all the (best of the) trained models over the test set. For each model, we recorded the numbers tp, tn, fp, fn of true positives, true negatives, false positives, and false negatives, respectively, and report the precision tp/(tp + fp) and recall tp/(tp + fn) metrics. Furthermore, to test the robustness of our models under variations to the threshold used in the classification, we modified each learned model by removing the application of the classification function, so that each modified model returns the real value labelling the node for the candidate constant in the last layer. We then applied the modified models to the test set, and used the outputs to compute the average precision (AP ), which is the area under the precision-recall curve.

Results
We report the results of our experiments for the WatDiv-Qi and FB15k237-Qi benchmarks in Tables 2 and 3, respectively. As one can see, the GNNQ L instances outperform the GNNQ − L instances on almost all benchmarks, when comparing instances whose HRGCN has the same number of layers. Furthermore, the GNNQ L instances with the smallest number of layers outperformed all GNNQ − instances by a significant margin on the FB15k237-Qi benchmarks. We attribute this to the fact that the real-world KGs are more noisy than the synthetic ones, and the baselines are more vulnerable to noise since they must learn longer dependencies. These results confirm our hypothesis that augmenting input KGs with facts representing the parts of the KG that satisfy connected query fragments can lead to improved empirical performance in the IQA task.

Related Work
KG Completion, which predicts missing facts in a KG, is a central soft reasoning task on KGs. Existing KG completion approaches can be classified in two categories. Transductive KG completion models learn an embedding function that maps constants and predicates in a fixed KG to elements of a vector space. At inference time, a missing target fact can then be verified by first applying the embedding function to the predicate and constants used in the target fact, and then applying a fixed scoring function to the resulting embeddings [1,6,7,21,27]. Inductive KG completion assumes only a fixed set of predicates, and a trained model can be applied to any KG over these predicates. Many inductive KG completion approaches use GNNs [10,14,23,24], which can reason over the structure of KGs and are therefore inductive by design.
Query Embedding (QE) aims to answer monadic queries from various classes over the completion of an arbitrary but fixed KG. Common QE approaches are inspired by embedding-based KG completion methods [4,9,11,17,18,20]. To produce query answers that are not logically entailed, such QE models usually jointly learn embedding functions for constants and for queries during training. At inference time, a QE model first embeds the input query using the learnt embedding functions and then scores constants as potential answers based on the distance of their embeddings to the query embedding. Thus QE approaches aim to answer arbitrary queries over the predicates and constants of a fixed KG. This is orthogonal to our inductive setting, which assumes a fixed query but is applicable to arbitrary KGs (over a predefined set of predicates).

Connection of Logic and GNNs.
The increasing interest in GNNs across different domains has motivated the theoretical analysis of the expressiveness and limitations of GNNs. For example, it is trivial to see that GNNs cannot distinguish between two non-isomorphic k-regular graphs of the same size with uniform node features. Further analysis connected GNNs to the family of well-known Weisfeiler-Lehman (WL) graph isomorphism tests; in particular, Xu et al. [26] and Morris et al. [15] independently showed that the most expressive GNNs can distinguish the same nodes as the 1-dimensional WL test and hence between the same nodes as formulas in FOC 2 , the two-variable fragment of the first-order logic with counting quantifiers. Further deep connections between various logics and GNNs have recently followed these works [5,8,22], and we anticipate that these results are paving a path for future efficient neuro-symbolic AI approaches to many tasks in data and knowledge management.

Conclusion and Future Work
In this paper, we presented a novel neuro-symbolic approach to query answering over incomplete KGs. In contrast to existing embedding-based approaches, which assume a fixed KG, our approach is inductive-that is, it only relies on a fixed set of predicates and is thus applicable to arbitrary KGs over these predicates. Our approach proceeds in three phases. First, it uses symbolic rules to augment the input KG with facts representing subgraphs that match connected fragments of the query. Second, it encodes the augmented KG into a hypergraph with vector-labelled nodes. Third, it processes the hypergraph using a Hyper-Relational Graph Convolutional Network (HRGCN), a novel GNN architecture which generalises the well-known RGCN architecture. We then provided a theorem showing that the KG augmentation phase can considerably reduce the number of layers a HRGCN-based system needs to produce correct answers to a query on every KG. Finally, we implemented our approach in the GNNQ system and evaluated it on several novel benchmarks. Our experiments showed that KG augmentation indeed leads to improved empirical performance in the IQA task. The main challenge for future work is extending our approach to support more expressive queries. We shall also investigate the queries and completion functions that can be perfectly captured by our approach and its potential extensions.

Supplemental Material Statement.
A proof of Theorem 1 as well as details about the creation of the benchmark datasets can be found in the supplementary material. This material, together with the source code of GNNQ, the benchmarks, and the instructions for the reproduction of our experiments are accessible through Github (https://github.com/KRR-Oxford/GNNQ).