Abstract
Abduction in description logics finds extensions of a knowledge base to make it entail an observation. As such, it can be used to explain why the observation does not follow, to repair incomplete knowledge bases, and to provide possible explanations for unexpected observations. We consider TBox abduction in the lightweight description logic \(\mathcal {EL}\), where the observation is a concept inclusion and the background knowledge is a TBox, i.e., a set of concept inclusions. To avoid useless answers, such problems usually come with further restrictions on the solution space and/or minimality criteria that help sort the chaff from the grain. We argue that existing minimality notions are insufficient, and introduce connection minimality. This criterion follows Occam’s razor by rejecting hypotheses that use concept inclusions unrelated to the problem at hand. We show how to compute a special class of connectionminimal hypotheses in a sound and complete way. Our technique is based on a translation to firstorder logic, and constructs hypotheses based on prime implicates. We evaluate a prototype implementation of our approach on ontologies from the medical domain.
Download conference paper PDF
1 Introduction
Ontologies are used in areas like biomedicine or the semantic web to represent and reason about terminological knowledge. They consist normally of a set of axioms formulated in a description logic (DL), giving definitions of concepts, or stating relations between them. In the lightweight description logic \(\mathcal {EL}\) [2], particularly used in the biomedical domain, we find ontologies that contain around a hundred thousand axioms. For instance, SNOMED CT^{Footnote 1} contains over 350,000 axioms, and the Gene Ontology GO^{Footnote 2} defines over 50,000 concepts. A central reasoning task for ontologies is to determine whether one concept is subsumed by another, a question that can be answered in polynomial time [1], and rather efficiently in practice using highly optimized description logic reasoners [29]. If the answer to this question is unexpected or hints at an error, a natural interest is in an explanation for that answer—especially if the ontology is complex. But whereas explaining entailments—i.e., explaining why a concept subsumption holds—is wellresearched in the DL literature and integrated into standard ontology editors [21, 22], the problem of explaining nonentailments has received less attention, and there is no standard tool support. Classical approaches involve counterexamples [5], or abduction.
In abduction a nonentailment \(\mathcal {T} \not \models \alpha \), for a TBox \(\mathcal {T}\) and an observation \(\alpha \), is explained by providing a “missing piece”, the hypothesis, that, when added to the ontology, would entail \(\alpha \). Thus it provides possible fixes in case the entailment should hold. In the DL context, depending on the shape of the observation, one distinguishes between concept abduction [6], ABox abduction [7,8,9,10, 12, 19, 24, 25, 30, 31], TBox abduction [11, 33] or knowledge base abduction [14, 26]. We are focusing here on TBox abduction, where the ontology and hypothesis are TBoxes and the observation is a concept inclusion (CI), i.e., a single TBox axiom.
To illustrate this problem, consider the following TBox, about academia,
that states, in natural language:

“Being employed in a research position and having a qualifying diploma implies being a researcher.”

“Writing a research paper implies being a researcher.”

“Being a doctor implies holding a PhD qualification.”

“Being a professor is being a doctor employed at a (university) chair.”

“Being a funds provider implies writing grant applications.”
The observation \(\alpha _{\text {a}} =\mathsf {Professor} \sqsubseteq \mathsf {Researcher} \), “Being a professor implies being a researcher”, does not follow from \(\mathcal {T} _{\text {a}} \) although it should. We can use TBox abduction to find different ways of recovering this entailment.
Commonly, to avoid trivial answers, the user provides syntactic restrictions on hypotheses, such as a set of abducible axioms to pick from [8, 30], a set of abducible predicates [25, 26], or patterns on the shape of the solution [11]. But even with those restrictions in place, there may be many possible solutions and, to find the ones with the best explanatory potential, syntactic criteria are usually combined with minimality criteria such as subset minimality, size minimality, or semantic minimality [7]. Even combined, these minimality criteria still retain a major flaw. They allow for explanations that go against the principle of parsimony, also known as Occam’s razor, in that they may contain concepts that are completely unrelated to the problem at hands. As an illustration, let us return to our academia example. The TBoxes
are two hypotheses solving the TBox abduction problem involving \(\mathcal {T} _{\text {a}} \) and \(\alpha _{\text {a}} \). Both of them are subsetminimal, have the same size, and are incomparable w.r.t. the entailment relation, so that traditional minimality criteria cannot distinguish them. However, intuitively, the second hypothesis feels more arbitrary than the first. Looking at \(\mathcal {H} _{\text {a}1} \), \(\mathsf {Chair} \) and \(\mathsf {ResearchPosition} \) occur in \(\mathcal {T} _{\text {a}}\) in concept inclusions where the concepts in \(\alpha _{\text {a}}\) also occur, and both \(\mathsf {PhD} \) and \(\mathsf {Diploma} \) are similarly related to \(\alpha _{\text {a}}\) but via the role \(\mathsf {qualification} \). In contrast, \(\mathcal {H} _{\text {a}2} \) involves the concepts \(\mathsf {FundsProvider} \) and \(\mathsf {GrantApplication} \) that are not related to \(\alpha _{\text {a}}\) in any way in \(\mathcal {T} _{\text {a}}\). In fact, any random concept inclusion \(A\sqsubseteq \exists \mathsf {writes}. B\) in \(\mathcal {T} _{\text {a}} \) would lead to a hypothesis similar to \(\mathcal {H} _{\text {a}2} \) where A replaces \(\mathsf {FundsProvider} \) and B replaces \(\mathsf {GrantApplication} \). Such explanations are not parsimonious.
We introduce a new minimality criterion called connection minimality that is parsimonious (Sect. 3), defined for the lightweight description logic \(\mathcal {EL}\). This criterion characterizes hypotheses for \(\mathcal {T} \) and \(\alpha \) that connect the left and righthand sides of the observation \(\alpha \) without introducing spurious connections. To achieve this, every lefthand side of a CI in the hypothesis must follow from the lefthand side of \(\alpha \) in \(\mathcal {T} \), and, taken together, all the righthand sides of the CIs in the hypothesis must imply the righthand side of \(\alpha \) in \(\mathcal {T} \), as is the case for \(\mathcal {H} _{\text {a}1} \). To compute connectionminimal hypotheses in practice, we present a technique based on firstorder reasoning that proceeds in three steps (Sect. 4). First, we translate the abduction problem into a firstorder formula \(\varPhi \). We then compute the prime implicates of \(\varPhi \), that is, a set of minimal logical consequences of \(\varPhi \) that subsume all other consequences of \(\varPhi \). In the final step, we construct, based on those prime implicates, solutions to the original problem. We prove that all hypotheses generated in this way satisfy the connection minimality criterion, and that the method is complete for a relevant subclass of connectionminimal hypotheses. We use the SPASS theorem prover [34] as a restricted SOSresolution [18, 35] engine for the computation of prime implicates in a prototype implementation (Sect. 5), and we present an experimental analysis of its performances on a set of biomedical ontologies.(Sect. 6). Our results indicate that our method can in many cases be applied in practice to compute connectionminimal hypotheses. A technical report companion of this paper includes all proofs as well as a detailed example of our method as appendices [16].
There are not many techniques that can handle TBox abduction in \(\mathcal {EL}\) or more expressive DLs [11, 26, 33]. In [11], instead of a set of abducibles, a set of justification patterns is given, in which the solutions have to fit. An arbitrary oracle function is used to decide whether a solution is admissible or not (which may use abducibles, justification patterns, or something else), and it is shown that deciding the existence of hypotheses is tractable. However, different to our approach, they only consider atomic CIs in hypotheses, while we also allow for hypotheses involving conjunction. The setting from [33] also considers \(\mathcal {EL}\), and abduction under various minimality notions such as subset minimality and size minimality. It presents practical algorithms, and an evaluation of an implementation for an alwaystrue informativeness oracle (i.e., limited to subset minimality). Different to our approach, it uses an external DL reasoner to decide entailment relationships. In contrast, we present an approach that directly exploits firstorder reasoning, and thus has the potential to be generalisable to more expressive DLs.
While dedicated resolution calculi have been used before to solve abduction in DLs [9, 26], to the best of our knowledge, the only work that relies on firstorder reasoning for DL abduction is [24]. Similar to our approach, it uses SOSresolution, but to perform ABox adbuction for the more expressive DL \(\mathcal {ALC}\). Apart from the different problem solved, in contrast to [24] we also provide a semantic characterization of the hypotheses generated by our method. We believe this characterization to be a major contribution of our paper. It provides an intuition of what parsimony is for this problem, independently of one’s ease with firstorder logic calculi, which should facilitate the adoption of this minimality criterion by the DL community. Thanks to this characterization, our technique is calculus agnostic. Any method to compute prime implicates in firstorder logic can be a basis for our abduction technique, without additional theoretical work, which is not the case for [24]. Thus, abduction in \(\mathcal {EL}\) can benefit from the latest advances in prime implicates generation in firstorder logic.
2 Preliminaries
We first recall the descripton logic \(\mathcal {EL}\) and its translation to firstorder logic [2], as well as TBox abduction in this logic.
Let \(\mathsf {N_C} \) and \(\mathsf {N_R} \) be pairwise disjoint, countably infinite sets of unary predicates called atomic concepts and of binary predicates called roles, respectively. Generally, we use letters A, B, E, F,... for atomic concepts, and r for roles, possibly annotated. Letters C, D, possibly annotated, denote \(\mathcal {EL}\) concepts, built according to the syntax rule
We implicitly represent \(\mathcal {EL} \) conjunctions as sets, that is, without order, nested conjunctions, and multiple occurrences of a conjunct. We use \(\sqcap \{C_1,\ldots ,C_m\}\) to abbreviate \(C_1\sqcap \ldots \sqcap C_m\), and identify the empty conjunction (\(m=0\)) with \(\top \). An \(\mathcal {EL}\) TBox \(\mathcal {T} \) is a finite set of concept inclusions (CIs) of the form \(C\sqsubseteq D\).
\(\mathcal {EL}\) is a syntactic variant of a fragment of firstorder logic that uses \(\mathsf {N_C} \) and \(\mathsf {N_R} \) as predicates. Specifically, TBoxes \(\mathcal {T} \) and CIs \(\alpha \) correspond to closed firstorder formulas \(\pi (\mathcal {T})\) and \(\pi (\alpha )\) resp., while concepts C correspond to open formulas \(\pi (C,x)\) with a free variable x. In particular, we have
As common, we often omit the \(\bigwedge \) in conjunctions \(\bigwedge \varPhi \), that is, we identify sets of formulas with the conjunction over those. The notions of a term t; an atom \(P(\bar{t})\) where \(\bar{t}\) is a sequence of terms; a positive literal \(P(\bar{t})\); a negative literal \(\lnot P(\bar{t})\); and a clause, Horn, definite, positive or negative, are defined as usual for firstorder logic, and so are entailment and satisfaction of firstorder formulas.
We identify CIs and TBoxes with their translation into firstorder logic, and can thus speak of the entailment between formulas, CIs and TBoxes. When \(\mathcal {T} \models C\sqsubseteq D\) for some \(\mathcal {T}\), we call C a subsumee of D and D a subsumer of C. We adhere here to the definition of the word “subsume”: “to include or contain something else”, although the terminology is reversed in firstorder logic. We say two TBoxes \(\mathcal {T} _1\), \(\mathcal {T} _2\) are equivalent, denoted \(\mathcal {T} _1\equiv \mathcal {T} _2\) iff \(\mathcal {T} _1\models \mathcal {T} _2\) and \(\mathcal {T} _2\models \mathcal {T} _1\). For example \(\{D\sqsubseteq C_1,\ldots , D\sqsubseteq C_n\}\equiv \{D\sqsubseteq C_1\sqcap \ldots \sqcap C_n\}\). It is well known that, due to the absence of concept negation, every \(\mathcal {EL}\) TBox is consistent.
The abduction problem we are concerned with in this paper is the following:
Definition 1
An \(\mathcal {EL}\) TBox abduction problem (shortened to abduction problem) is a tuple \(\langle \mathcal {T},\Sigma ,C_1\sqsubseteq C_2\rangle \), where \(\mathcal {T} \) is a TBox called the background knowledge, \(\Sigma \) is a set of atomic concepts called the abducible signature, and \(C_1\sqsubseteq C_2\) is a CI called the observation, s.t. \(\mathcal {T} \not \models C_1\sqsubseteq C_2\). A solution to this problem is a TBox
where \(m>0\), \(n\ge 0\) and such that \(\mathcal {T} \cup \mathcal {H} \models C_1\sqsubseteq C_2\) and, for all CIs \(\alpha \in \mathcal {H} \), \(\mathcal {T} \not \models \alpha \). A solution to an abduction problem is called a hypothesis.
For example, \(\mathcal {H} _{\text {a}1} \) and \(\mathcal {H} _{\text {a}2} \) are solutions for \(\langle \mathcal {T} _{\text {a}},\Sigma ,\alpha _{\text {a}} \rangle \), as long as \(\Sigma \) contains all the atomic concepts that occur in them. Note that in our setting, as in [6, 33], concept inclusions in a hypothesis are flat, i.e., they contain no existential role restrictions. While this restricts the solution space for a given problem, it is possible to bypass this limitation in a targeted way, by introducing fresh atomic concepts equivalent to a concept of interest. We exclude the consistency requirement \(\mathcal {T} \cup \mathcal {H} \not \models \bot \), that is given in other definitions of DL abduction problem [25], since \(\mathcal {EL}\) TBoxes are always consistent. We also allow \(m>1\) instead of the usual \(m=1\). This produces the same hypotheses modulo equivalence.
For simplicity, we assume in the following that the concepts \(C_1\) and \(C_2\) in the abduction problem are atomic. We can always introduce fresh atomic concepts \(A_1\) and \(A_2\) with \(A_1\sqsubseteq C_1\) and \(C_2\sqsubseteq A_2\) to solve the problem for complex concepts.
Common minimality criteria include subset minimality, size minimality and semantic minimality, that respectively favor \(\mathcal {H} \) over \(\mathcal {H} '\) if: \(\mathcal {H} \subsetneq \mathcal {H} '\); the number of atomic concepts in \(\mathcal {H} \) is smaller than in \(\mathcal {H} '\); and if \(\mathcal {H} \models \mathcal {H} '\) but \(\mathcal {H} '\not \models \mathcal {H} \).
3 ConnectionMinimal Abduction
To address the lack of parsimony of common minimality criteria, illustrated in the academia example, we introduce connection minimality, Intuitively, connection minimality only accepts those hypotheses that ensure that every CI in the hypothesis is connected to both \(C_1\) and \(C_2\) in \(\mathcal {T} \), as is the case for \(\mathcal {H} _{\text {a}1} \) in the academia example. The definition of connection minimality is based on the following ideas: 1) Hypotheses for the abduction problem should create a connection between \(C_1\) and \(C_2\), which can be seen as a concept D that satisfies \(\mathcal {T} \cup \mathcal {H} \models C_1\sqsubseteq D\), \(D\sqsubseteq C_2\). 2) To ensure parsimony, we want this connection to be based on concepts \(D_1\) and \(D_2\) for which we already have \(\mathcal {T} \models C_1\sqsubseteq D_1\), \(D_2\sqsubseteq C_2\). This prevents the introduction of unrelated concepts in the hypothesis. Note however that \(D_1\) and \(D_2\) can be complex, thus the connection from \(C_1\) to \(D_1\) (resp. \(D_2\) to \(C_2\)) can be established by arbitrarily long chains of concept inclusions. 3) We additionally want to make sure that the connecting concepts are not more complex than necessary, and that \(\mathcal {H} \) only contains CIs that directly connect parts of \(D_2\) to parts of \(D_1\) by closely following their structure.
To address point 1), we simply introduce connecting concepts formally.
Definition 2
Let \(C_1\) and \(C_2\) be concepts. A concept D connects \(C_1\) to \(C_2\) in \(\mathcal {T} \) if and only if \(\mathcal {T} \models C_1\sqsubseteq D\) and \(\mathcal {T} \models D\sqsubseteq C_2\).
Note that if \(\mathcal {T} \models C_1 \sqsubseteq C_2\) then both \(C_1\) and \(C_2\) are connecting concepts from \(C_1\) to \(C_2\), and if \(\mathcal {T} \not \models C_1 \sqsubseteq C_2\), the case of interest, neither of them are.
To address point 2), we must capture how a hypothesis creates the connection between the concepts \(C_1\) and \(C_2\). As argued above, this is established via concepts \(D_1\) and \(D_2\) that satisfy \(\mathcal {T} \models C_1\sqsubseteq D_1\), \(D_2\sqsubseteq C_2\). Note that having only two concepts \(D_1\) and \(D_2\) is exactly what makes the approach parsimonious. If there was only one concept, \(C_1\) and \(C_2\) would already be connected, and as soon as there are more than two concepts, hypotheses start becoming more arbitrary: for a very simple example with unrelated concepts, assume given a TBox that entails \(\mathsf {Lion} \sqsubseteq \mathsf {Felidae} \), \(\mathsf {Mammal} \sqsubseteq \mathsf {Animal} \) and \(\mathsf {House} \sqsubseteq \mathsf {Building} \). A possible hypothesis to explain \(\mathsf {Lion} \sqsubseteq \mathsf {Animal} \) is \(\{\mathsf {Felidae} \sqsubseteq \mathsf {House},\mathsf {Building} \sqsubseteq \mathsf {Mammal} \}\) but this explanation is more arbitrary than \(\{\mathsf {Felidae} \sqsubseteq \mathsf {Mammal} \}\)—as is the case when comparing \(\mathcal {H} _{\text {a}2} \) with \(\mathcal {H} _{\text {a}1} \) in the academia example—because of the lack of connection of \(\mathsf {House} \sqsubseteq \mathsf {Building} \) with both \(\mathsf {Lion} \) and \(\mathsf {Animal} \). Clearly this CI could be replaced by any other CI entailed by \(\mathcal {T}\), which is what we want to avoid.
We can represent the structure of \(D_1\) and \(D_2\) in graphs by using \(\mathcal {EL}\) description trees, originally from Baader et al. [3].
Definition 3
An \(\mathcal {EL}\) description tree is a finite labeled tree \(\mathfrak {T} =(V,E,v_0,l)\) where V is a set of nodes with root \(v_0\in V\), the nodes \(v\in V\) are labeled with \(l(v)\subseteq \mathsf {N_C} \), and the (directed) edges \(vrw \in E\) are such that \(v,w\in V\) and are labeled with \(r\in \mathsf {N_R} \).
Given a tree \(\mathfrak {T} =(V,E,v_0,l)\) and \(v\in V\), we denote by \(\mathfrak {T} (v)\) the subtree of \(\mathfrak {T} \) that is rooted in v. If \(l(v_0)=\{A_1,\ldots ,A_k\}\) and \(v_1\), \(\ldots \), \(v_n\) are all the children of \(v_0\), we can define the concept represented by \(\mathfrak {T} \) recursively using \( C_\mathfrak {T} =A_1\sqcap \ldots \sqcap A_k\sqcap \exists r_1. C_{\mathfrak {T} (v_1)}\sqcap \ldots \sqcap \exists r_l.C_{\mathfrak {T} (v_l)} \) where for \(j\in \{1,\ldots ,n\}\), \(v_0 r_j v_j\in E\). Conversely, we can define \(\mathfrak {T} _C\) for a concept \(C=A_1\sqcap \ldots \sqcap A_k\sqcap \exists r_1.C_1\sqcap \ldots \sqcap \exists r_n.C_n\) inductively based on the pairwise disjoint description trees \(\mathfrak {T} _{C_i}=\{V_i, E_i, v_i, l_i\}\), \(i\in \{1,\ldots , n\}\). Specifically, \(\mathfrak {T} _C=(V_C, E_C,v_C, l_C)\), where
If \(\mathcal {T} =\emptyset \), then subsumption between \(\mathcal {EL}\) concepts is characterized by the existence of a homomorphism between the corresponding description trees [3]. We generalise this notion to also take the TBox into account.
Definition 4
Let \(\mathfrak {T} _1=(V_1,E_1,v_0,l_1)\) and \(\mathfrak {T} _2=(V_2,E_2,w_0,l_2)\) be two description trees and \(\mathcal {T} \) a TBox. A mapping \(\phi : V_2\rightarrow V_1\) is a \(\mathcal {T}\)homomorphism from \(\mathfrak {T} _2\) to \(\mathfrak {T} _1\) if and only if the following conditions are satisfied:

1.
\(\phi (w_0)=v_0\)

2.
\(\phi (v)r\phi (w)\in E_1\) for all \(vrw\in E_2\)

3.
for every \(v\in V_1\) and \(w\in V_2\) with \(v=\phi (w)\), \(\mathcal {T} \models \sqcap l_1(v)\sqsubseteq \sqcap l_2(w)\)
If only 1 and 2 are satisfied, then \(\phi \) is called a weak homomorphism.
\(\mathcal {T}\)homomorphisms for a given TBox \(\mathcal {T}\) capture subsumption w.r.t. \(\mathcal {T}\). If there exists a \(\mathcal {T}\)homomorphism \(\phi \) from \(\mathfrak {T} _2\) to \(\mathfrak {T} _1\), then \(\mathcal {T} \models C_{\mathfrak {T} _1}\sqsubseteq C_{\mathfrak {T} _2}\). This can be shown easily by structural induction using the definitions [16]. The weak homomorphism is the structure on which a \(\mathcal {T}\)homomorphism can be built by adding some hypothesis \(\mathcal {H}\) to \(\mathcal {T}\). It is used to reveal missing links between a subsumee \(D_2\) of \(C_2\) and a subsumer \(D_1\) of \(C_1\), that can be added using \(\mathcal {H}\).
Example 5
Consider the concepts
from the academia example. Figure 1 illustrates description trees for \(D_1\) (left) and \(D_2\) (right). The curved arrows show a weak homomorphism from \(\mathfrak {T} _{D_2}\) to \(\mathfrak {T} _{D_1}\) that can be strengthened into a \(\mathcal {T}\)homomorphism for some TBox \(\mathcal {T} \) that corresponds to the set of CIs in \(\mathcal {H} _{\text {a}1} \cup \{\top \sqsubseteq \top \}\). The figure can also be used to illustrate what we mean by connection minimality: in order to create a connection between \(D_1\) and \(D_2\), we should only add the CIs from \(\mathcal {H} _{\text {a}1} \cup \{\top \sqsubseteq \top \}\) unless they are already entailed by \(\mathcal {T} _{\text {a}} \). In practice, this means the weak homomorphism from \(D_2\) to \(D_1\) becomes a \((\mathcal {T} _{\text {a}} \cup \mathcal {H} _{\text {a}1})\)homomorphism.
To address point 3), we define a partial order \(\preceq _\sqcap \) on concepts, s.t. \(C\preceq _\sqcap D\) if we can turn D into C by removing conjuncts in subexpressions, e.g., \(\exists r'. B \preceq _\sqcap \exists r. A \sqcap \exists r'. (B \sqcap B') \). Formally, this is achieved by the following definition.
Definition 6
Let C and D be arbitrary concepts. Then \(C\preceq _\sqcap D\) if either:

\(C = D\),

\(D = D' \sqcap D''\), and \(C\preceq _\sqcap D'\), or

\(C = \exists r.C'\), \(D = \exists r.D'\) and \(C'\preceq _\sqcap D'\).
We can finally capture our ideas on connection minimality formally.
Definition 7
(ConnectionMinimal Abduction). Given an abduction problem \(\langle \mathcal {T},\Sigma ,C_1\sqsubseteq C_2\rangle \), a hypothesis \(\mathcal {H}\) is connectionminimal if there exist concepts \(D_1\) and \(D_2\) built over \(\Sigma \cup \mathsf {N_R} \) and a mapping \(\phi \) satisfying each of the following conditions:

1.
\(\mathcal {T} \models C_1\sqsubseteq D_1\),

2.
\(D_2\) is a \(\preceq _\sqcap \)minimal concept s.t. \(\mathcal {T} \models D_2\sqsubseteq C_2\),

3.
\(\phi \) is a weak homomorphism from the tree \(\mathfrak {T} _{D_2}=(V_2,E_2,w_0,l_2)\) to the tree \(\mathfrak {T} _{D_1}=(V_1,E_1,v_0,l_1)\), and

4.
\(\mathcal {H} =\{\sqcap l_1(\phi (w))\sqsubseteq \sqcap l_2(w)\mid w\in V_2\wedge \mathcal {T} \not \models \sqcap l_1(\phi (w))\sqsubseteq \sqcap l_2(w)\}\).
\(\mathcal {H} \) is additionally called packed if the lefthand sides of the CIs in \(\mathcal {H} \) cannot hold more conjuncts than they do, which is formally stated as: for \(\mathcal {H}\), there is no \(\mathcal {H}\) \('\) defined from the same \(D_2\) and a \(D_1'\) and \(\phi '\) s.t. there is a node \(w\in V_2\) for which \(l_1(\phi (w))\subsetneq l_1'(\phi '(w))\) and \(l _1(\phi (w'))=l_1'(\phi '(w'))\) for \(w'\ne w\).
Straightforward consequences of Definition 7 include that \(\phi \) is a \((\mathcal {T} \cup \mathcal {H})\)homomorphism from \(\mathfrak {T} _{D_2}\) to \(\mathfrak {T} _{D_1}\) and that \(D_1\) and \(D_2\) are connecting concepts from \(C_1\) to \(C_2\) in \(\mathcal {T} \cup \mathcal {H} \) so that \(\mathcal {T} \cup \mathcal {H} \models C_1\sqsubseteq C_2\) as wanted [16]. With the help of Fig. 1 and Example 5, one easily establishes that hypothesis \(\mathcal {H} _{\text {a}1} \) is connectionminimal—and even packed. Connectionminimality rejects \(\mathcal {H} _{\text {a}2} \), as a single \(\mathcal {T} '\)homomorphism for some \(\mathcal {T} '\) between two concepts \(D_1\) and \(D_2\) would be insufficient: we would need two weak homomorphisms, one linking \(\mathsf {Professor} \) to \(\mathsf {FundsProvider} \) and another linking \(\exists \mathsf {writes}.\mathsf {GrantApplication} \) to \(\exists \mathsf {writes}.\mathsf {ResearchPaper} \).
4 Computing ConnectionMinimal Hypotheses Using Prime Implicates
To compute connectionminimal hypotheses in practice, we propose a method based on firstorder prime implicates, that can be derived by resolution. We assume the reader is familiar with the basics of firstorder resolution, and do not reintroduce notions of clauses, Skolemization and resolution inferences here (for details, see [4]). In our context, every term is built on variables, denoted x, y, a single constant \(\mathtt {sk}_0\) and unary Skolem functions usually denoted \(\mathtt {sk}\), possibly annotated. Prime implicates are defined as follows.
Definition 8
(Prime Implicate). Let \(\varPhi \) be a set of clauses. A clause \(\varphi \) is an implicate of \(\varPhi \) if \(\varPhi \models \varphi \). Moreover \(\varphi \) is prime if for any other implicate \(\varphi '\) of \(\varPhi \) s.t. \(\varphi '\models \varphi \), it also holds that \(\varphi \models \varphi '\).
Let \(\Sigma \subseteq \mathsf {N_C} \) be a set of unary predicates. Then \({\mathcal {PI}^{g+}_\Sigma (\varPhi )}\) denotes the set of all positive ground prime implicates of \(\varPhi \) that only use predicate symbols from \(\Sigma \cup \mathsf {N_R} \), while \(\mathcal {PI}^{g}_\Sigma (\varPhi )\) denotes the set of all negative ground prime implicates of \(\varPhi \) that only use predicates symbols from \(\Sigma \cup \mathsf {N_R} \).
Example 9
Given a set of clauses \(\varPhi = \{A_1(\mathtt {sk}_0),\lnot B_1(\mathtt {sk}_0), \lnot A_1(x)\vee r(x,\mathtt {sk}(x)),\)
\(\lnot A_1(x)\vee A_2(\mathtt {sk}(x)), \lnot B_2(x)\vee \lnot r(x,y)\vee \lnot B_3(y)\vee B_1(x)\}\), the ground prime implicates of \(\varPhi \) for \(\Sigma = \mathsf {N_C} \) are, on the positive side, \({\mathcal {PI}^{g+}_\Sigma (\varPhi )}=\{A_1(\mathtt {sk}_0),\) \(A_2(\mathtt {sk}(\mathtt {sk}_0)), r(\mathtt {sk}_0,\mathtt {sk}(\mathtt {sk}_0))\}\) and, on the negative side, \(\mathcal {PI}^{g}_\Sigma (\varPhi )=\{\lnot B_1(\mathtt {sk}_0),\) \(\lnot B_2(\mathtt {sk}_0)\vee \lnot B_3(\mathtt {sk}(\mathtt {sk}_0))\}\). They are implicates because all of them are entailed by \(\varPhi \). For a ground implicate \(\varphi \), another ground implicate \(\varphi '\) such that \(\varphi '\models \varphi \) and \(\varphi \not \models \varphi '\) can only be obtained from \(\varphi \) by dropping literals. Such an operation does not produce another implicate for any of the clauses presented above as belonging to \({\mathcal {PI}^{g+}_\Sigma (\varPhi )}\)and \(\mathcal {PI}^{g}_\Sigma (\varPhi )\), thus they really are all prime.
To generate hypotheses, we translate the abduction problem into a set of firstorder clauses, from which we can infer prime implicates that we then combine to obtain the result as illustrated in Fig. 2. In more details: We first translate the problem into a set \(\varPhi \) of Horn clauses. Prime implicates can be computed using an offtheshelf tool [13, 28] or, in our case, a slight extension of the resolutionbased version of the SPASS theorem prover [34] using the setofsupport strategy and some added features described in Sect. 5. Since \(\varPhi \) is Horn, \({\mathcal {PI}^{g+}_\Sigma (\varPhi )}\) contains only unit clauses. A final recombination step looks at the clauses in \(\mathcal {PI}^{g}_\Sigma (\varPhi )\) one after the other. These correspond to candidates for the connecting concepts \(D_2\) of Definition 7. Recombination attempts to match each literal in one such clause with unit clauses from \({\mathcal {PI}^{g+}_\Sigma (\varPhi )}\). If such a match is possible, it produces a suitable \(D_1\) to match \(D_2\), and allows the creation of a solution to the abduction problem. The set \(\mathcal {S}\) contains all the hypotheses thus obtained.
In what follows, we present our translation of abduction problems into firstorder logic and formalize the construction of hypotheses from the prime implicates of this translation. We then show how to obtain termination for the prime implicate generation process with soundness and completeness guarantees on the solutions computed.
Abduction Method. We assume the \(\mathcal {EL}\) TBox in the input is in normal form as defined, e.g., by Baader et al. [2]. Thus every CI is of one of the following forms:
where A, \(A_1\), \(A_2\), \(B\in \mathsf {N_C} \cup \{\top \}\).
The use of normalization is justified by the following lemma.
Lemma 10
For every \(\mathcal {EL}\) TBox \(\mathcal {T} \), we can compute in polynomial time an \(\mathcal {EL}\) TBox \(\mathcal {T} '\) in normal form such that for every other TBox \(\mathcal {H} \) and every CI \(C\sqsubseteq D\) that use only names occurring in \(\mathcal {T} \), we have \(\mathcal {T} \cup \mathcal {H} \models C\sqsubseteq D\) iff \(\mathcal {T} '\cup \mathcal {H} \models C\sqsubseteq D\).
After the normalisation, we eliminate occurrences of \(\top \), replacing this concept everywhere by the fresh atomic concept \(A_\top \). We furthermore add \(\exists r.A_\top \sqsubseteq A_\top \) and \(B\sqsubseteq A_\top \) in \(\mathcal {T} \) for every role r and atomic concept B occurring in \(\mathcal {T} \). This simulates the semantics of \(\top \) for \(A_\top \), namely the implicit property that \(C\sqsubseteq \top \) holds for any C no matter what the TBox is. In particular, this ensures that whenever there is a positive prime implicate B(t) or \(r(t,t')\), \(A_\top (t)\) also becomes a prime implicate. Note that normalisation and \(\top \) elimination extend the signature, and thus potentially the solution space of the abduction problem. This is remedied by intersecting the set of abducible predicates \(\Sigma \) with the signature of the original input ontology. We assume that \(\mathcal {T} \) is in normal form and without \(\top \) in the rest of the paper.
We denote by \(\mathcal {T} ^\) the result of renaming all atomic concepts A in \(\mathcal {T} \) using fresh duplicate symbols \(A^\). This renaming is done only on concepts but not on roles, and on \(C_2\) but not on \(C_1\) in the observation. This ensures that the literals in a clause of \(\mathcal {PI}^{g}_\Sigma (\varPhi )\) all relate to the conjuncts of a \(\preceq _\sqcap \)minimal subsumee of \(C_2\). Without it, some of these conjuncts would not appear in the negative implicates due to the presence of their positive counterparts as atoms in \({\mathcal {PI}^{g+}_\Sigma (\varPhi )}\). The translation of the abduction problem \(\langle \mathcal {T},\Sigma ,C_1\sqsubseteq C_2\rangle \) is defined as the Skolemization of
where \(\mathtt {sk}_0\) is used as the unique fresh Skolem constant such that the Skolemization of \(\lnot \pi (C_1\sqsubseteq C_2^)\) results in \(\{C_1(\mathtt {sk}_0),\lnot C_2^(\mathtt {sk}_0)\}\). This translation is usually denoted \(\varPhi \) and always considered in clausal normal form.
Theorem 11
Let \(\langle \mathcal {T},\Sigma ,C_1\sqsubseteq C_2\rangle \) be an abduction problem and \(\varPhi \) be its firstorder translation. Then, a TBox \(\mathcal {H} '\) is a packed connectionminimal solution to the problem if and only if an equivalent hypothesis \(\mathcal {H} \) can be constructed from nonempty sets \(\mathcal {A} \) and \(\mathcal {B} \) of atoms verifying:

\(\mathcal {B} = \{B_1(t_1), \ldots , B_m(t_m)\}\) s.t. \(\left( \lnot B_1^(t_1)\vee \dots \vee \lnot B_m^(t_m) \right) \in \mathcal {PI}^{g}_\Sigma (\varPhi )\),

for all \(t\in \{t_1,\ldots ,t_m\}\) there exists an A s.t. \(A(t)\in {\mathcal {PI}^{g+}_\Sigma (\varPhi )}\),

\(\mathcal {A} =\{A(t)\in {\mathcal {PI}^{g+}_\Sigma (\varPhi )}\mid t\text { is one of }t_1,\ldots ,t_m\}\), and

\(\mathcal {H} =\{C_{\mathcal {A},t}\sqsubseteq C_{\mathcal {B},t} \mid t\text { is one of }t_1,\ldots ,t_m \text { and } C_{\mathcal {B},t}\not \preceq _\sqcap C_{\mathcal {A},t}\}\), where \(C_{\mathcal {A},t}=\sqcap _{A(t)\in \mathcal {A}}A\) and \(C_{\mathcal {B},t}=\sqcap _{B(t)\in \mathcal {B}}B\).
We call the hypotheses that are constructed as in Theorem 11 constructible. This theorem states that every packed connectionminimal hypothesis is equivalent to a constructible hypothesis and vice versa. A constructible hypothesis is built from the concepts in one negative prime implicate in \(\mathcal {PI}^{g}_\Sigma (\varPhi )\) and all matching concepts from prime implicates in \({\mathcal {PI}^{g+}_\Sigma (\varPhi )}\). The matching itself is determined by the Skolem terms that occur in all these clauses. The subterm relation between the terms of the clauses in \({\mathcal {PI}^{g+}_\Sigma (\varPhi )}\) and \(\mathcal {PI}^{g}_\Sigma (\varPhi )\) is the same as the ancestor relation in the description trees of subsumers of \(C_1\) and subsumees of \(C_2\) respectively. The terms matching in positive and negative prime implicates allow us to identify where the missing entailments between a subsumer \(D_1\) of \(C_1\) and a subsumee \(D_2\) of \(C_2\) are. These missing entailments become the constructible \(\mathcal {H} \). The condition \(C_{\mathcal {B},t}\not \preceq _\sqcap C_{\mathcal {A},t}\) is a way to write that \(C_{\mathcal {A},t}\sqsubseteq C_{\mathcal {B},t}\) is not a tautology, which can be tested by subset inclusion.
The formal proof of this result is detailed in the technical report [16]. We sketch it briefly here. To start, we link the subsumers of \(C_1\) with \({\mathcal {PI}^{g+}_\Sigma (\varPhi )}\). This is done at the semantics level: We show that all Herbrand models of \(\varPhi \), i.e., models built on the symbols in \(\varPhi \), are also models of \({\mathcal {PI}^{g+}_\Sigma (\varPhi )}\), that is itself such a model. Then we show that \(C_1(\mathtt {sk}_0)\) as well as the formulas corresponding to the subsumers of \(C_1\) in our translation are satisfied by all Herbrand models. This follows from the fact that \(\varPhi \) is in fact a set of Horn clauses. Next, we show, using a similar technique, how duplicate negative ground implicates, not necessarily prime, relate to subsumees of \(C_2\), with the restriction that there must exist a weak homomorphism from a description tree of a subsumer of \(C_1\) to a description tree of the considered subsumee of \(C_2\). Thus, \(\mathcal {H} \) provides the missing CIs that will turn the weak homomorphism into a \((\mathcal {T} \cup \mathcal {H})\)homomorphism. Then, we establish an equivalence between the \(\preceq _\sqcap \)minimality of the subsumee of \(C_2\) and the primality of the corresponding negative implicate. Packability is the last aspect we deal with, whose use is purely limited to the reconstruction. It holds because \(\mathcal {A}\) contains all \(A(t)\in {\mathcal {PI}^{g+}_\Sigma (\varPhi )}\) for all terms t occurring in \(\mathcal {B}\).
Example 12
Consider the abduction problem \(\langle \mathcal {T} _{\text {a}},\Sigma , \alpha _{\text {a}} \rangle \) where \(\Sigma \) contains all concepts from \(\mathcal {T} _{\text {a}} \). For the translation \(\varPhi \) of this problem, we have
where \(\mathtt {sk}_1\) is the Skolem function introduced for \(\mathsf {Professor} \sqsubseteq \exists \mathsf {employment}.\mathsf {Chair} \) and \(\mathtt {sk}_2\) is introduced for \(\mathsf {Doctor} \sqsubseteq \exists \mathsf {qualification}.\mathsf {PhD} \). This leads to two constructible solutions: \(\{\mathsf {Professor} \sqcap \mathsf {Doctor} \sqsubseteq \mathsf {Researcher} \}\) and \(\mathcal {H} _{\text {a}1} \), that are both packed connectionminimal hypotheses if \(\Sigma =\mathsf {N_C} \). Another example is presented in full details in the technical report [16].
Termination. If \(\mathcal {T} \) contains cycles, there can be infinitely many prime implicates. For example, for \(\mathcal {T} =\{C_1\sqsubseteq A, A\sqsubseteq \exists r.A, \exists r. B\sqsubseteq B, B\sqsubseteq C_2\}\) both the positive and negative ground prime implicates of \(\varPhi \) are unbounded even though the set of constructible hypotheses is finite (as it is for any abduction problem):
To find all constructible hypotheses of an abduction problem, an approach that simply computes all prime implicates of \(\varPhi \), e.g., using the standard resolution calculus, will never terminate on cyclic problems. However, if we look only for subsetminimal constructible hypotheses, termination can be achieved for cyclic and noncyclic problems alike, because it is possible to construct all such hypotheses from prime implicates that have a polynomially bounded term depth, as shown below. To obtain this bound, we consider resolution derivations of the ground prime implicates and we show that they can be done under some restrictions that imply this bound.
Before performing resolution, we compute the presaturation \(\varPhi _p\) of the set of clauses \(\varPhi \), defined as
where A and B are either both original or both duplicate atomic concepts. The presaturation can be efficiently computed before the translation, using a modern \(\mathcal {EL}\) reasoner such as Elk [23], which is highly optimized towards the computation of all entailments of the form \(A\sqsubseteq B\). While the presaturation computes nothing a resolution procedure could not derive, it is what allows us to bind the maximal depth of terms in inferences to that in prime implicates. If \(\varPhi _p\) is presaturated, we do not need to perform inferences that produce Skolem terms of a higher nesting depth than what is needed for the prime implicates.
Starting from the presaturated set \(\varPhi _p\), we can show that all the relevant prime implicates can be computed if we restrict all inferences to those where
 R1:

at least one premise contains a ground term,
 R2:

the resolvent contains at most one variable, and
 R3:

every literal in the resolvent contains Skolem terms of nesting depth at most \(n\times m\), where n is the number of atomic concepts in \(\varPhi \), and m is the number of occurrences of existential role restrictions in \(\mathcal {T} \).
The first restriction turns the derivation of \({\mathcal {PI}^{g+}_\Sigma (\varPhi )}\) and \(\mathcal {PI}^{g}_\Sigma (\varPhi )\) into an SOSresolution derivation [18] with set of support \(\{C_1(\mathtt {sk}_0),C_2^(\mathtt {sk}_0)\}\), i.e., the only two clauses with ground terms in \(\varPhi \). This restriction is a straightforward consequence of our interest in computing only ground implicates, and of the fact that the nonground clauses in \(\varPhi \) cannot entail the empty clause since every \(\mathcal {EL}\) TBox is consistent. The other restrictions are consequences of the following theorems, whose proofs are available in the technical report [16].
Theorem 13
Given an abduction problem and its translation \(\varPhi \), every constructible hypothesis can be built from prime implicates that are inferred under restriction 4.
In fact, for \({\mathcal {PI}^{g+}_\Sigma (\varPhi )}\) it is even possible to restrict inferences to generating only ground resolvents, as can be seen in the proof of Theorem 13, that directly looks at the kinds of clauses that are derivable by resolution from \(\varPhi \).
Theorem 14
Given an abduction problem and its translation \(\varPhi \), every subsetminimal constructible hypothesis can be built from prime implicates that have a nesting depth of at most \(n\times m\), where n is the number of atomic concepts in \(\varPhi \), and m is the number of occurrences of existential role restrictions in \(\mathcal {T} \).
The proof of Theorem 14 is based on a structure called a solution tree, which resembles a description tree, but with multiple labeling functions. It assigns to each node a Skolem term, a set of atomic concepts called positive label, and a single atomic concept called negative label. The nodes correspond to matching partners in a constructible hypothesis: The Skolem term is the term on which we match literals. The positive label collects the atomic concepts in the positive prime implicates containing that term. The maximal antichains of the tree, i.e., the maximal subsets of nodes s.t. no node is the ancestor of another are such that their negative labels correspond to the literals in a derivable negative implicate. For every solution tree, the Skolem labels and negative labels of the leaves determine a negative prime implicate, and by combining the positive and negative labels of these leaves, we obtain a constructible hypothesis, called the solution of the tree. We show that from every solution tree with solution \(\mathcal {H} \) we can obtain a solution tree with solution \(\mathcal {H} '\subseteq \mathcal {H} \) s.t. on no path, there are two nodes that agree both on the head of their Skolem labeling and on the negative label. Furthermore the number of head functions of Skolem labels is bounded by the total number n of Skolem functions, while the number of distinct negative labels is bounded by the number m of atomic concepts, bounding the depth of the solution tree for \(\mathcal {H} '\) at \(n\times m\). This justifies the bound in Theorem 14. This bound is rather loose. For the academia example, it is equal to \(22\times 6 = 132\).
5 Implementation
We implemented our method to compute all subsetminimal constructible hypotheses in the tool CAPI.^{Footnote 3} To compute the prime implicates, we used SPASS [34], a firstorder theorem prover that includes resolution among other calculi. We implemented everything before and after the prime implicate computation in Java, including the parsing of ontologies, preprocessing (detailed below), clausification of the abduction problems, translation to SPASS input, as well as the parsing and processing of the output of SPASS to build the constructible hypotheses and filter out the nonsubsetminimal ones. On the Java side, we used the OWL API for all DLrelated functionalities [20], and the \(\mathcal {EL}\) reasoner Elk for computing the presaturations [23].
Preprocessing. Since realistic TBoxes can be too large to be processed by SPASS, we replace the background knowledge in the abduction problem by a subset of axioms relevant to the abduction problem. Specifically, we replace the abduction problem \((\mathcal {T},\Sigma ,C_1\sqsubseteq C_2)\) by the abduction problem \((\mathcal {M} _{C_1}^\bot \cup \mathcal {M} _{C_2}^\top ,\Sigma ,C_1\sqsubseteq C_2)\), where \(\mathcal {M} _{C_1}^\bot \) is the \(\bot \)module of \(\mathcal {T} \) for the signature of \(C_1\), and \(\mathcal {M} _{C_2}^\top \) is the \(\top \)module of \(\mathcal {T} \) for the signature of \(C_2\) [15]. Those notions are explained in the technical report [16]. Their relevant properties are that \(\mathcal {M} _{C_1}^\bot \) is a subset of \(\mathcal {T} \) s.t. \(\mathcal {M} _{C_1}^\bot \models C_1\sqsubseteq D\) iff \(\mathcal {T} \models C_1\sqsubseteq D\) for all concepts D, while \(\mathcal {M} _{C_2}^\top \) is a subset of \(\mathcal {T} \) that ensures \(\mathcal {M} _{C_2}^\top \models D\sqsubseteq C_2\) iff \(\mathcal {T} \models D\sqsubseteq C_2\) for all concepts D. It immediately follows that every connectionminimal hypothesis for the original problem \((\mathcal {T},\Sigma ,C_1\sqsubseteq C_2)\) is also a connectionminimal hypothesis for \((\mathcal {M} _{C_1}^\bot \cup \mathcal {M} _{C_2}^\top ,\Sigma ,C_1\sqsubseteq C_2)\). For the presaturation, we compute with Elk all CIs of the form \(A\sqsubseteq B\) s.t. \(\mathcal {M} _{C_1}^\bot \cup \mathcal {M} _{C_2}^\top \models A\sqsubseteq B\).
Prime implicates generation. We rely on a slightly modified version of SPASS v3.9 to compute all ground prime implicates. In particular, we added the possibility to limit the number of variables allowed in the resolvents to enforce R2. For each of the restrictions R1–R3 there is a corresponding flag (or set of flags) that is passed to SPASS as an argument.
Recombination. The construction of hypotheses from the prime implicates found in the previous stage starts with a straightforward process of matching negative prime implicates with a set of positive ones based on their Skolem terms. It is followed by subset minimality tests to discard nonsubsetminimal hypotheses, since, with the bound we enforce, there is no guarantee that these are valid constructible hypotheses because the negative ground implicates they are built upon may not be prime. If SPASS terminates due to a timeout instead of reaching the bound, then it is possible that some subsetminimal constructible hypotheses are not found, and thus, some nonconstructible hypotheses may be kept. Note that these are in any case solutions to the abduction problem.
6 Experiments
There is no benchmark suite dedicated to TBox abduction in \(\mathcal {EL}\), so we created our own, using realistic ontologies from the biomedical domain. For this, we used ontologies from the 2017 snapshot of Bioportal [27]. We restricted each ontology to its \(\mathcal {EL}\) fragment by filtering out unsupported axioms, where we replaced domain axioms and nary equivalence axioms in the usual way [2]. Note that, even if the ontology contains more expressive axioms, an \(\mathcal {EL}\) hypothesis is still useful if found. From the resulting set of TBoxes, we selected those containing at least 1 and at most 50,000 axioms, resulting in a set of 387 \(\mathcal {EL}\) TBoxes. Precisely, they contained between 2 and 46,429 axioms, for an average of 3,039 and a median of 569. Towards obtaining realistic benchmarks, we created three different categories of abduction problems for each ontology \(\mathcal {T} \), where in each case, we used the signature of the entire ontology for \(\Sigma \).

Problems in ORIGIN use \(\mathcal {T} \) as background knowledge, and as observation a randomly chosen \(A\sqsubseteq B\) s.t. A and B are in the signature of \(\mathcal {T} \) and \(\mathcal {T} \not \models A\sqsubseteq B\). This covers the basic requirements of an abduction problem, but has the disadvantage that A and B can be completely unrelated in \(\mathcal {T} \).

Problems in JUSTIF contain as observation a randomly selected CI \(\alpha \) s.t., for the original TBox, \(\mathcal {T} \models \alpha \) and \(\alpha \not \in \mathcal {T} \). The background knowledge used is a justification for \(\alpha \) in \(\mathcal {T} \) [32], that is, a minimal subset \(\mathcal {I} \subseteq \mathcal {T} \) s.t. \(\mathcal {I} \not \models \alpha \), from which a randomly selected axiom is removed. The TBox is thus a smaller set of axioms extracted from a real ontology for which we know there is a way of producing the required entailment without adding it explicitly. Justifications were computed using functionalities of the OWL API and Elk.

Problems in REPAIR contain as observation a randomly selected CI \(\alpha \) s.t. \(\mathcal {T} \models \alpha \), and as background knowledge a repair for \(\alpha \) in \(\mathcal {T} \), which is a maximal subset \(\mathcal {R} \subseteq \mathcal {T} \) s.t. \(\mathcal {R} \not \models \alpha \). Repairs were computed using a justificationbased algorithm [32] with justifications computed as for JUSTIF. This usually resulted in much larger TBoxes, where more axioms would be needed to establish the entailment.
All experiments were run on Debian Linux (Intel Core i54590, 3.30 GHz, 23 GB Java heap size). The code and scripts used in the experiments are available online [17]. The three phases of the method (see Fig. 2) were each assigned a hard time limit of 90 s.
For each ontology, we attempted to create and translate 5 abduction problems of each category. This failed on some ontologies because either there was no corresponding entailment (25/28/25 failures out of the 387 ontologies for ORIGIN/JUSTIF/REPAIR), there was a timeout during the translation (5/5/5 failures for ORIGIN/JUSTIF/REPAIR), or because the computation of justifications caused an exception (/2/0 failures for ORIGIN/JUSTIF/REPAIR). The final number of abduction problems for each category is in the first column of Table 1.
We then attempted to compute prime implicates for these benchmarks using SPASS. In addition to the hard time limit, we gave a soft time limit of 30 s to SPASS, after which it should stop exploring the search space and return the implicates already found. In Table 1 we show, for each category, the percentage of problems on which SPASS succeeded in computing a nonempty set of clauses (Success) and the percentage of problems on which SPASS terminated within the time limit, where all solutions are computed (Compl.). The high number of CIs in the background knowledge explains most of the cases where SPASS reached the soft time limit. In a lot of these cases, the bound on the term depth goes into the billion, rendering it useless in practice. However, the “Compl.” column shows that the bound is reached before the soft time limit in most cases.
The reconstruction never reached the hard time limit. We measured the median, average and maximal number of solutions found (#\(\mathcal {H}\)), size of solutions in number of CIs (\(\mathcal {H} \)), size of CIs from solutions in number of atomic concepts (\(\alpha \)), and SPASS runtime (time, in seconds), all reported in Table 1. Except for the simple JUSTIF problems, the number of solutions may become very large. At the same time, solutions always contain very few axioms (never more than 3), though the axioms become large too. We also noticed that highly nested Skolem terms rarely lead to more hypotheses being found: 8/1/15 for ORIGIN/JUSTIF/REPAIR, and the largest nesting depth used was: 3/1/2 for ORIGIN/JUSTIF/REPAIR. This hints at the fact that longer time limits would not have produced more solutions, and motivates future research into redundancy criteria to stop derivations (much) earlier.
7 Conclusion
We have introduced connectionminimal TBox abduction for \(\mathcal {EL}\) which finds parsimonious hypotheses, ruling out the ones that entail the observation in an arbitrary fashion. We have established a formal link between the generation of connectionminimal hypotheses in \(\mathcal {EL}\) and the generation of prime implicates of a translation \(\varPhi \) of the problem to firstorder logic. In addition to obtaining these theoretical results, we developed a prototype for the computation of subsetminimal constructible hypotheses, a subclass of connectionminimal hypotheses that is easy to construct from the prime implicates of \(\varPhi \). Our prototype uses the SPASS theorem prover as an SOSresolution engine to generate the needed implicates. We tested this tool on a set of realistic medical ontologies, and the results indicate that the cost of computing connectionminimal hypotheses is high but not prohibitive.
We see several ways to improve our technique. The bound we computed to ensure termination could be advantageously replaced by a redundancy criterion discarding irrelevant implicates long before it is reached, thus greatly speeding computation in SPASS. We believe it should also be possible to further constrain inferences, e.g., to have them produce ground clauses only, or to generate the prime implicates with terms of increasing depth in a controlled incremental way instead of enforcing the soft time limit, but these two ideas remain to be proved feasible. As an alternative to using prime implicates, one may investigate direct method for computing connectionminimal hypotheses in \(\mathcal {EL}\).
The theoretical worstcase complexity of connectionminimal abduction is another open question. Our method only gives a very high upper bound: by bounding only the nesting dept of Skolem terms polynomially as we did with Theorem 13, we may still permit clauses with exponentially many literals, and thus double exponentially many clauses in the worst case, which would give us an 2ExpTime upper bound to the problem of computing all subsetminimal constructible hypotheses. Using structuresharing and guessing, it is likely possible to get a lower bound. We have not looked yet at lower bounds for the complexity either.
While this work focuses on abduction problems where the observation is a CI, we believe that our technique can be generalised to knowledge that also contains ground facts (ABoxes), and to observations that are of the form of conjunctive queries on the ABoxes in such knowledge bases. The motivation for such an extension is to understand why a particular query does not return any results, and to compute a set of TBox axioms that fix this problem. Since our translation already transforms the observation into ground facts, it should be possible to extend it to this setting. We would also like to generalize TBox abduction by finding a reasonable way to allow role restrictions in the hypotheses, and to extend connectionminimality to more expressive DLs such as \(\mathcal {ALC}\).
Notes
 1.
 2.
 3.
available under https://lat.inf.tudresden.de/~koopmann/CAPI.
References
Baader, F., Brandt, S., Lutz, C.: Pushing the \(\cal{EL}\) envelope. In: Kaelbling, L.P., Saffiotti, A. (eds.) IJCAI05, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK, 30 July  5 August 2005, pp. 364–369. Professional Book Center (2005). http://ijcai.org/Proceedings/05/Papers/0372.pdf
Baader, F., Horrocks, I., Lutz, C., Sattler, U.: An Introduction to Description Logic. Cambridge University Press, Cambridge (2017). https://doi.org/10.1017/9781139025355
Baader, F., Küsters, R., Molitor, R.: Computing least common subsumers in description logics with existential restrictions. In: Proceedings of IJCAI 1999, pp. 96–103. Morgan Kaufmann (1999)
Bachmair, L., Ganzinger, H.: Resolution theorem proving. In: Robinson, J.A., Voronkov, A. (eds.) Handbook of Automated Reasoning (in 2 volumes), pp. 19–99. Elsevier and MIT Press, Cambridge (2001). https://doi.org/10.1016/b9780444508133/500047
Bauer, J., Sattler, U., Parsia, B.: Explaining by example: model exploration for ontology comprehension. In: Grau, B.C., Horrocks, I., Motik, B., Sattler, U. (eds.) Proceedings of the 22nd International Workshop on Description Logics (DL 2009), Oxford, UK, 27–30 July 2009. CEUR Workshop Proceedings, vol. 477. CEURWS.org (2009). http://ceurws.org/Vol477/paper_37.pdf
Bienvenu, M.: Complexity of abduction in the \(\cal{EL}\) family of lightweight description logics. In: Proceedings of KR 2008, pp. 220–230. AAAI Press (2008), http://www.aaai.org/Library/KR/2008/kr08022.php
Calvanese, D., Ortiz, M., Simkus, M., Stefanoni, G.: Reasoning about explanations for negative query answers in DLLite. J. Artif. Intell. Res. 48, 635–669 (2013). https://doi.org/10.1613/jair.3870
Ceylan, İ.İ., Lukasiewicz, T., Malizia, E., Molinaro, C., Vaicenavicius, A.: Explanations for negative query answers under existential rules. In: Calvanese, D., Erdem, E., Thielscher, M. (eds.) Proceedings of KR 2020, pp. 223–232. AAAI Press (2020). https://doi.org/10.24963/kr.2020/23
DelPinto, W., Schmidt, R.A.: ABox abduction via forgetting in \(\cal{ALC}\). In: The ThirtyThird AAAI Conference on Artificial Intelligence, AAAI 2019, pp. 2768–2775. AAAI Press (2019). https://doi.org/10.1609/aaai.v33i01.33012768
Du, J., Qi, G., Shen, Y., Pan, J.Z.: Towards practical ABox abduction in large description logic ontologies. Int. J. Semantic Web Inf. Syst. 8(2), 1–33 (2012). https://doi.org/10.4018/jswis.2012040101
Du, J., Wan, H., Ma, H.: Practical TBox abduction based on justification patterns. In: Proceedings of the ThirtyFirst AAAI Conference on Artificial Intelligence, pp. 1100–1106 (2017). http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14402
Du, J., Wang, K., Shen, Y.: A tractable approach to ABox abduction over description logic ontologies. In: Brodley, C.E., Stone, P. (eds.) Proceedings of the TwentyEighth AAAI Conference on Artificial Intelligence, pp. 1034–1040. AAAI Press (2014). http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8191
Echenim, M., Peltier, N., Sellami, Y.: A generic framework for implicate generation modulo theories. In: Galmiche, D., Schulz, S., Sebastiani, R. (eds.) IJCAR 2018. LNCS (LNAI), vol. 10900, pp. 279–294. Springer, Cham (2018). https://doi.org/10.1007/9783319942056_19
Elsenbroich, C., Kutz, O., Sattler, U.: A case for abductive reasoning over ontologies. In: Proceedings of the OWLED’06 Workshop on OWL: Experiences and Directions (2006). http://ceurws.org/Vol216/submission_25.pdf
Grau, B.C., Horrocks, I., Kazakov, Y., Sattler, U.: Modular reuse of ontologies: theory and practice. J. Artif. Intell. Res. 31, 273–318 (2008). https://doi.org/10.1613/jair.2375
Haifani, F., Koopmann, P., Tourret, S., Weidenbach, C.: Connectionminimal abduction in \(\cal{EL}\) via translation to FOL  technical report (2022). https://doi.org/10.48550/ARXIV.2205.08449, https://arxiv.org/abs/2205.08449
Haifani, F., Koopmann, P., Tourret, S., Weidenbach, C.: Experiment data for the paper Connectionminimal Abduction in EL via translation to FOL, May 2022. https://doi.org/10.5281/zenodo.6563656
Haifani, F., Tourret, S., Weidenbach, C.: Generalized completeness for SOS resolution and its application to a new notion of relevance. In: Platzer, A., Sutcliffe, G. (eds.) CADE 2021. LNCS (LNAI), vol. 12699, pp. 327–343. Springer, Cham (2021). https://doi.org/10.1007/9783030798765_19
Halland, K., Britz, K.: ABox abduction in \(\cal{ALC}\) using a DL tableau. In: 2012 South African Institute of Computer Scientists and Information Technologists Conference, SAICSIT ’12, pp. 51–58 (2012). https://doi.org/10.1145/2389836.2389843
Horridge, M., Bechhofer, S.: The OWL API: a java API for OWL ontologies. Semant. Web 2(1), 11–21 (2011). https://doi.org/10.3233/SW20110025
Horridge, M., Parsia, B., Sattler, U.: Explanation of OWL entailments in protege 4. In: Bizer, C., Joshi, A. (eds.) Proceedings of the Poster and Demonstration Session at the 7th International Semantic Web Conference (ISWC2008), Karlsruhe, Germany, 28 October 2008. CEUR Workshop Proceedings, vol. 401. CEURWS.org (2008). http://ceurws.org/Vol401/iswc2008pd_submission_47.pdf
Kazakov, Y., Klinov, P., Stupnikov, A.: Towards reusable explanation services in protege. In: Artale, A., Glimm, B., Kontchakov, R. (eds.) Proceedings of the 30th International Workshop on Description Logics, Montpellier, France, 18–21 July 2017. CEUR Workshop Proceedings, vol. 1879. CEURWS.org (2017). http://ceurws.org/Vol1879/paper31.pdf
Kazakov, Y., Krötzsch, M., Simancik, F.: The incredible ELK  from polynomial procedures to efficient reasoning with \(\cal{EL}\) ontologies. J. Autom. Reason. 53(1), 1–61 (2014). https://doi.org/10.1007/s1081701392963
Klarman, S., Endriss, U., Schlobach, S.: ABox abduction in the description logic \(\cal{ALC}\). J. Autom. Reason. 46(1), 43–80 (2011). https://doi.org/10.1007/s108170109168z
Koopmann, P.: Signaturebased abduction with fresh individuals and complex concepts for description logics. In: Zhou, Z. (ed.) Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event/Montreal, Canada, 19–27 August 2021, pp. 1929–1935 (2021). https://doi.org/10.24963/ijcai.2021/266
Koopmann, P., DelPinto, W., Tourret, S., Schmidt, R.A.: Signaturebased abduction for expressive description logics. In: Calvanese, D., Erdem, E., Thielscher, M. (eds.) Proceedings of the 17th International Conference on Principles of Knowledge Representation and Reasoning, KR 2020, pp. 592–602. AAAI Press (2020). https://doi.org/10.24963/kr.2020/59
Matentzoglu, N., Parsia, B.: Bioportal snapshot 30.03.2017 (2017). https://doi.org/10.5281/zenodo.439510
Nabeshima, H., Iwanuma, K., Inoue, K., Ray, O.: SOLAR: an automated deduction system for consequence finding. AI Commun. 23(2–3), 183–203 (2010). https://doi.org/10.3233/AIC20100465
Parsia, B., Matentzoglu, N., Gonçalves, R.S., Glimm, B., Steigmiller, A.: The owl reasoner evaluation (ORE) 2015 competition report. J. Autom. Reason. 59(4), 455–482 (2017). https://doi.org/10.1007/s1081701794068
Pukancová, J., Homola, M.: Tableaubased ABox abduction for the \(\cal{ALCHO}\) description logic. In: Proceedings of the 30th International Workshop on Description Logics (2017). http://ceurws.org/Vol1879/paper11.pdf
Pukancová, J., Homola, M.: The AAA Abox abduction solver. KI  Künstliche Intell. 34(4), 517–522 (2020). https://doi.org/10.1007/s13218020006854
Schlobach, S., Cornet, R.: Nonstandard reasoning services for the debugging of description logic terminologies. In: Gottlob, G., Walsh, T. (eds.) Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI 2003), pp. 355–362. Morgan Kaufmann, Acapulco, Mexico (2003). http://ijcai.org/Proceedings/03/Papers/053.pdf
WeiKleiner, F., Dragisic, Z., Lambrix, P.: Abduction framework for repairing incomplete \(\cal{EL}\) ontologies: complexity results and algorithms. In: Proceedings of the TwentyEighth AAAI Conference on Artificial Intelligence, pp. 1120–1127. AAAI Press (2014). http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8239
Weidenbach, C., Schmidt, R.A., Hillenbrand, T., Rusev, R., Topic, D.: System description: Spass version 3.0. In: Pfenning, F. (ed.) CADE 2007. LNCS (LNAI), vol. 4603, pp. 514–520. Springer, Heidelberg (2007). https://doi.org/10.1007/9783540735953_38
Wos, L., Robinson, G., Carson, D.: Efficiency and completeness of the set of support strategy in theorem proving. J. ACM 12(4), 536–541 (1965)
Acknowledgments
This work was supported by the Deutsche Forschungsgemeinschaft (DFG), Grant 389792660 within TRR 248.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this paper
Cite this paper
Haifani, F., Koopmann, P., Tourret, S., Weidenbach, C. (2022). ConnectionMinimal Abduction in \(\mathcal {EL}\) via Translation to FOL. In: Blanchette, J., Kovács, L., Pattinson, D. (eds) Automated Reasoning. IJCAR 2022. Lecture Notes in Computer Science(), vol 13385. Springer, Cham. https://doi.org/10.1007/9783031107696_12
Download citation
DOI: https://doi.org/10.1007/9783031107696_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783031107689
Online ISBN: 9783031107696
eBook Packages: Computer ScienceComputer Science (R0)