Finding Good Proofs for Description Logic Entailments Using Recursive Quality Measures (Extended Technical Report)

Logic-based approaches to AI have the advantage that their behavior can in principle be explained to a user. If, for instance, a Description Logic reasoner derives a consequence that triggers some action of the overall system, then one can explain such an entailment by presenting a proof of the consequence in an appropriate calculus. How comprehensible such a proof is depends not only on the employed calculus, but also on the properties of the particular proof, such as its overall size, its depth, the complexity of the employed sentences and proof steps, etc. For this reason, we want to determine the complexity of generating proofs that are below a certain threshold w.r.t. a given measure of proof quality. Rather than investigating this problem for a fixed proof calculus and a fixed measure, we aim for general results that hold for wide classes of calculi and measures. In previous work, we first restricted the attention to a setting where proof size is used to measure the quality of a proof. We then extended the approach to a more general setting, but important measures such as proof depth were not covered. In the present paper, we provide results for a class of measures called recursive, which yields lower complexities and also encompasses proof depth. In addition, we close some gaps left open in our previous work, thus providing a comprehensive picture of the complexity landscape.


Introduction
Explainability has developed into a major issue in Artificial Intelligence, particularly in the context of sub-symbolic approaches based on Machine Learning [6]. In contrast, results produced by symbolic approaches based on logical reasoning are "explainable by design" since a derived consequence can be formally justified by showing a proof for it. In practice, things are not that easy since proofs may be very long, and even single proof steps or stated sentences may be hard to comprehend for a user that is not an expert in logic. For this reason, there has been considerable work in the Automated Deduction and Logic in AI communities on how to produce "good" proofs for certain purposes, both for full first-order logic, but also for decidable logics such a Description Logics (DLs) [9].
We mention here only a few approaches, and refer the reader to the introduction of our previous work [2] for a more detailed review.
First, there is work that transforms proofs that are produced by an automated reasoning system into ones in a calculus that is deemed to be more appropriate for human consumption [11,22,23]. Second, abstraction techniques are used to reduce the size of proofs by introducing definitions, lemmas, and more abstract deduction rules [16,17]. Justification-based explanations for DLs [10,14,29] can be seen as a radical abstraction technique where the abstracted proof consists of a single proof step, from a minimal set of stated sentences that implies a certain consequence directly to this consequence. Finally, instead of presenting proofs in a formal, logical syntax, one can also try to increase readability by translating them into natural language text [12,25,27,28] or visualizing them [5].
The purpose of this work is of a more (complexity) theoretic nature. We want to investigate how hard it is to find good proofs, where the quality of a proof is described by a measure m that assigns non-negative rational numbers to proofs. More precisely, as usual we investigate the complexity of the corresponding decision problem, i.e., the problem of deciding whether there is a proof P with m(P) ≤ q for a given rational number q. In order to abstract from specific logics and proof calculi, we develop a general framework in which proofs are represented as labeled, directed hypergraphs, whose hyperedges correspond to single sound derivation steps. To separate the complexity of generating good proofs from the complexity of reasoning in the underlying logic, we introduce the notion of a deriver, which generates a so-called derivation structure. This structure consists of possible proof steps, from which all proofs of the given consequence can be constructed. Basically, such a derivation structure can be seen as consisting of all relevant instantiations of the rules of a calculus that can be used to derive the consequence. We restrict the attention to decidable logics and consider derivers that produce derivation structures of polynomial or exponential size. Examples of such derivers are consequence-based reasoners for the DLs EL [7,21] and ELI [9,18], respectively. In our complexity results, the derivation structure is assumed to be already computed by the deriver, 1 i.e., the complexity of this step is not assumed to be part of the complexity of computing good proofs. Our complexity results investigate the problem along the following orthogonal dimensions: we distinguish between (i) polynomial and exponential derivers; and (ii) whether the threshold value q is encoded in unary or binary. The obtained complexity upper bounds hold for all instances of a considered setting, whereas the lower bounds mean that there is an instance (usually based on EL or ELI) for which this lower bound can be proved.
In our first work in this direction [2], we focused our attention on size as the measure of proof quality. We could show that the above decision problem is NP-complete even for polynomial derivers and unary coding of numbers. For exponential derivers, the complexity depends on the coding of numbers: NPcomplete (NExpTime-complete) for unary (binary) coding. For the related mea- sure tree size (which assumes that the proof hypergraphs are tree-shaped, i.e. cannot reuse already derived consequences), the complexity turned out to be considerably lower, due to the fact that a Dijkstra-like greedy algorithm can be applied. In [3], we generalized the results by introducing a class of measures called Ψ -measures, which contains both size and tree size and for which the same complexity upper bounds as for size could be shown for polynomial derivers. We also lifted the better upper bounds for tree size (for polynomial derivers) to local Ψ -measures, a natural class of proof measures. In this paper, we extend this line of research by providing a more general notion of measures, monotone recursive Φ-measures, which now also allow to measure the depth of a proof. We think that depth is an important measure since it measures how much of the proof tree a (human or automated) proof checker needs to keep in memory at the same time. We analyze these measures not only for polynomial derivers, but this time also consider exponential derivers, thus giving insights on how our complexity results transfer to more expressive logics. In addition to upper bounds for the general class of monotone recursive Φ-measures, we show improved bounds for the specific measures considering depth and tree size, in the latter case improving results from [2]. Overall, we thus obtain a comprehensive picture of the complexity landscape for the problem of finding good proofs for DL and other entailments (see Table 1). This is an extended version of the paper [4], including an appendix with more detailed proofs and some auxiliary lemmas.

Preliminaries
Most of our theoretical discussion applies to arbitrary logics L = (S L , |= L ) that consist of a set S L of L-sentences and a consequence relation |= L ⊆ P (S L ) × S L between L-theories, i.e. subsets of L-sentences, and single L-sentences. We assume that |= L has a semantic definition, i.e. for some definition of "model", T |= L η holds iff every model of all elements in T is also a model of η. We also assume that the size |η| of an L-sentence η is defined in some way, e.g. by the number of symbols in η. Since L is usually fixed, we drop the prefix "L-" from now on. For example, L could be first-order logic. However, we are mainly interested in proofs for DLs, which can be seen as decidable fragments of first-order logic [9]. In particular, we use specific DLs to show our hardness results. The syntax of DLs is based on disjoint, countably infinite sets N C and N R of concept names A, B, . . . and role names r, s, . . . , respectively. Sentences of the DL EL, called general concept inclusions (GCIs), are of the form C ⊑ D, where C and D are EL-concepts, which are built from concept names by applying the constructors ⊤ (top), C ⊓ D (conjunction), and ∃r.C (existential restriction for a role name r). The DL ELI extends EL by the role constructor r − (inverse role). In DLs, finite theories are called TBoxes or ontologies.
The semantics of DLs is based on first-order interpretations; for details, see [9]. In Figure 1, we depict a simplified version of the inference rules for EL from [21].
Deciding consequences in EL is P-complete [7], and in ELI it is ExpTimecomplete [8].

Proofs
We formalize proofs as (labeled, directed) hypergraphs (see Figures 2,3), which are tuples (V, E, ℓ) consisting of a finite set V of vertices, a finite set E of (hyper)edges of the form (S, d) with S ⊆ V and d ∈ V , and a vertex labeling function ℓ : V → S L . Full definitions of such hypergraphs, as well as related notions such as trees, unravelings, homomorphisms, cycles can be found in the appendix. For example, there is a homomorphism from Figure 3 to Figure 2, but not vice versa, and Figure 3 is the tree unraveling of Figure 2.

Fig. 3. A tree hypergraph/proof
The following definition formalizes basic requirements for hyperedges to be considered valid inference steps from a given finite theory.

Definition 1 (Derivation Structure).
A derivation structure D = (V, E, ℓ) over a finite theory T is a hypergraph that is We define proofs as special derivation structures that derive a conclusion. Definition 2 (Proof). Given a conclusion η and a finite theory T , a proof for T |= η is a derivation structure P = (V, E, ℓ) over T such that -P contains exactly one sink v η ∈ V , which is labeled by η, -P is acyclic, and every vertex has at most one incoming edge, i.e. there is A tree proof is a proof that is a tree. A subproof S of a hypergraph H is a subgraph of H that is a proof s.t. the leaves of S are a subset of the leaves of H.
The hypergraphs in Figures 2 and 3 can be seen as proofs in the sense of Definition 2, where the sentences of the theory are marked with a thick border. Both proofs use the same inference steps, but have different numbers of vertices. They both prove A ⊑ B ⊓ ∃r.A from T = {A ⊑ B, B ⊑ ∃r.A}. The second proof is a tree and the first one a hypergraph without label repetition. Given a proof P = (V, E, ℓ) and a vertex v ∈ V , the subproof of P with sink v is the largest subgraph P v = (V v , E v , ℓ v ) of P where V v contains all vertices in V that have a path to v in P.

Derivers
In practice, proofs and derivation structures are constructed by a reasoning system, and in theoretical investigations, it is common to define proofs by means of a calculus. To abstract from these details, we use the concept of a deriver as in [2], which is a function that, given a theory T and a conclusion η, produces the corresponding derivation structure in which we can look for an optimal proof. However, in practice, it would be inefficient and unnecessary to compute the entire derivation structure beforehand when looking for an optimal proof. Instead, we allow to access elements in a derivation structure using an oracle, which we can ask whether given inferences are a part of the current derivation structure. Similar functionality exists for example for the DL reasoner Elk [19], and may correspond to checking whether the inference is an instance of a rule in the calculus. Since reasoners may not be complete for proving arbitrary sentences of L, we restrict the conclusion η to a subset C L ⊆ S L of supported consequences.

CR1
if A ∈ K and K appears in Fig. 4. The inference rules for ELI [9]. Given a finite theory T in a certain normal form, the rules produce a saturated theory T ′ . Here, K, L, M are conjunctions of concept names, A is a concept name, C is an ELI concept of the form A, ∃r.M , or ∀r.A, and r is a role name or the inverse of a role name. In this calculus conjunctions are implicitly viewed as sets, i.e. the order and multiplicity of conjuncts is ignored.

Definition 4 (Deriver).
A deriver D is given by a set C L ⊆ S L and a function that assigns derivation structures to pairs (T , η) of finite theories T ⊆ S L and sentences η ∈ C L , such that T |= η iff D(T , η) contains a proof for T |= η. Elk is an example of a polynomial deriver, that is, for a given EL theory T and EL sentence η, Elk(T , η) contains all allowed instances of the rules shown in Figure 1. As an example for an exponential deriver we use Eli, which uses the rules from In this paper, we focus on polynomial and exponential derivers, for which we further make the following technical assumptions: 1) D(T , η) does not contain two vertices with the same label; 2) the number of premises in an inference is polynomially bounded by |T | and |η|; and 3) the size of each label is polynomially bounded by |T | and |η|. While 1) is without loss of generality, 2) and 3) are not. If a deriver does not satisfy 2), we may be able to fix this by splitting inference steps. Assumption 3) would not work for derivers with higher complexity, but is required in our setting to avoid trivial complexity results for exponential derivers. We furthermore assume that for polynomial and exponential derivers, the polynomial p from Definition 4 bounding the size of derivation structures is known.

Measuring Proofs
To formally study quality measures for proofs, we developed the following definition, which will be instantiated with concrete measures later. Our goal is to find proofs that minimize these measures, i.e. lower numbers are better.
where P L is the set of all proofs over L and Q ≥0 is the set of non-negative rational numbers. We call m a Φ-measure if, for every P ∈ P L , the following hold.
[P] m(P) is computable in polynomial time in the size of P.
[HI] Let h : P → H be any homomorphism, and P ′ be any subproof of the homomorphic image h(P) that is minimal (w.r.t. m) among all such subproofs having the same sink. Then m(P ′ ) ≤ m(P).
Intuitively, a Φ-measure m does not increase when the proof gets smaller, either when parts of the proof are removed (to obtain a subproof) or when parts are merged (in a homomorphic image). For example, m size ((V, E, ℓ)) := |V | is a Φ-measure, called the size of a proof, and we have already investigated the complexity of the following deicision problem for m size in [2].

Definition 6 (Optimal Proof). Let D be a deriver and m be a measure. Given a finite theory T and a sentence
The associated decision problem, denoted OP(D, m), is to decide, given T and η as above and q ∈ Q ≥0 , whether there is an admissible proof P w.
For our complexity analysis, we distinguish the encoding of q with a subscript (unary/binary), e.g. OP unary (D, m). We first show that if P is optimal w.r.t. a Φ-measure m and D(T , η), then the homomorphic image of P in D(T , η) is also a proof. Thus, to decide OP(D, m) we can restrict our search to proofs that are subgraphs of D(T , η).

Lemma 7. For any deriver
In particular, this shows that an optimal proof always exists. Proof. By Definition 4, the derivation structure D(T , η) contains at least one proof for T |= η. Since D(T , η) is finite, there are finitely many proofs for T |= η contained in D(T , η). The finite set of all m-weights of these proofs always has a minimum. Finally, if there were an admissible proof weighing less than this minimum, it would contradict Lemma 7.

Monotone Recursive Measures
Since the complexity of OP(D, m) for Φ-measures in general is quite high [2], in this paper we focus on a subclass of measures that can be evaluated recursively.
Such a measure is monotone if, for any multiset Q, whenever q ∈ Q and Intuitively, a recursive measure m can be computed in a bottom-up fashion starting with the weights of the leaves given by leaf m . The function edge m is used to recursively combine the weights of the direct subproofs into a weight for the full proof. This function is well-defined since in a proof every vertex has at most one incoming edge. We require edge m to be defined only for inputs (S, α), Q that actually correspond to a valid proof in L, i.e. where S |= L α and Q consists of the weights of some proofs for the sentences in S. For example, if m always yields natural numbers, we obviously do not need edge m to be defined for multisets containing fractional numbers.
In this paper, we are particularly interested in the following monotone recursive Φ-measures.
-The depth m depth of a proof is defined by leaf m depth (α) := 0 and edge m depth (S, α), Q := 1 + max Q.
What distinguishes tree size from size is that vertices are counted multiple times if they are used in several subproofs. The name tree size is inspired by the fact that it can be interpreted as the size of the tree unraveling of a given proof (cf. Figures 2 and 3). In fact, we show in the appendix that all recursive Φmeasures are invariant under unraveling. This indicates that tree size, depth and other monotone recursive Φ-measures are especially well-suited for cases where proofs are presented to users in the form of trees. This is for example the case for the proof plugin for Protégé [20].

Lemma 10.
Depth and tree size are monotone recursive Φ-measures.

Algorithm 1: A Dijkstra-like algorithm
if k(e) = |S| then // all source vertices have been reached

Complexity Results
We investigate the decision problem OP for monotone recursive Φ-measures. We first show upper bounds for the general case, and then consider measures for depth and tree size, for which we obtain even lower bounds. An artificial modification of the depth measure gives a lower bound matching the general upper bound even if unary encoding is used for the threshold q.

The General Case
Algorithm 1 describes a Dijkstra-like approach that is inspired by the algorithm in [13] for finding minimal hyperpaths w.r.t. so-called additive weighting functions, which represent a subclass of monotone recursive Φ-measures. The algorithm progressively discovers proofs P(v) for ℓ(v) that are contained in D(T , η). If it reaches a new vertex v in this process, this vertex is added to the set Q. In each step, a vertex with minimal weight m(P(v)) is chosen and removed from Q. For each hyperedge e = (S, d) ∈ E, a counter k(e) is maintained that is increased whenever a vertex v ∈ S is chosen. Once this counter reaches |S|, we know that all source vertices of e have been processed. The algorithm then constructs a new proof P for ℓ(d) by joining the proofs for the source vertices using the current hyperedge e. This proof P is then compared to the best previously known proof P(d) for ℓ(d) and P(d) is updated accordingly. For Line 20, recall that we assumed D(T , η) to contain no two vertices with the same label, and hence it contains a unique vertex v η with label η. Lemma 11. For any monotone recursive Φ-measure m and deriver D, Algorithm 1 computes an optimal proof in time polynomial in the size of D(T , η).
Since we can actually compute an optimal proof in polynomial time in the size of the whole derivation structure, it is irrelevant how the upper bound q in the decision problem OP is encoded, and hence the following results follow.

Theorem 12.
For any monotone recursive Φ-measure m and polynomial deriver D, OP binary (D, m) is in P. It is in ExpTime for all exponential derivers D.

Proof Depth
We now consider the measure m depth in more detail. We can show lower bounds of P and ExpTime for polynomial and exponential derivers, respectively, although the latter only holds for upper bounds q encoded in binary.
Since our definition of OP(D, m) requires that the input entailment T |= η already holds, we cannot use a straightforward reduction from the entailment problem in EL or ELI, however. Instead, we show that ordinary proofs P for T |= η satisfy m(P) ≤ q for some q, and then extend the TBox to T ′ in order to create an artificial proof P ′ with m(P ′ ) > q. In this way, we ensure that T ′ |= η holds and can use q to distinguish the artificial from the original proofs.
For ELI, we can use an observation from [9, Example 6.29] for this purpose. We can now reduce the entailment problems for EL and ELI to obtain the claimed lower bounds.

Theorem 14.
The problems OP unary (Elk, m depth ) and OP binary (Eli, m depth ) are P-hard and ExpTime-hard, respectively.
Proof. For the P-hardness, we provide a LogSpace-reduction from the entailment problem of a GCI A ⊑ B with two concept names A, B from an ELtheory T , which is P-hard [9]. To reduce this problem to OP unary (Elk, m tree ), we need to find a theory T ′ and a number q such that T ′ |= A ⊑ B holds, and moreover T |= Lemma 7).
First, observe that, since proofs must be acyclic, the depth of any proof of A ⊑ B from T is bounded by q := |Elk(T , A ⊑ B)|, whose size in unary encoding is polynomial in the size of T . We now construct where A 1 , . . . , A q are concept names that do not occur in T . Furthermore, the existence of an admissible proof for T ′ |= A ⊑ B of depth at most q is equivalent to T |= A ⊑ B, since any proof that uses the new concept names must take q + 1 consecutive steps using rule R ⊑ , i.e. must be of depth q + 1. Moreover, we can compute q (in binary representation) and output it in unary representation using a logarithmically space-bounded Turing machine, and similarly for T ′ . Hence, the above construction constitutes the desired LogSpace-reduction.
For the remaining result, we can use similar arguments about the exponential deriver Eli, where entailment is ExpTime-hard [9]: the minimal depth of a proof in an exponential derivation structure is at most exponential, and this exponential bound q can be computed in polynomial time using binary encoding; -by Proposition 13, there is an ELI theory T of size polynomial in the size of the binary encoding of q such that T |= A ⊑ B and any proof for T |= A ⊑ B must have at least depth q + 1.

⊓ ⊔
To demonstrate that the generic upper bounds from Theorem 12 are tight even for unary encoding, we quickly consider the artificial measure m log(depth) (logarithmic depth), which simply computes the (binary) logarithm of the depth of a given proof. This is also a monotone recursive Φ-measure, since the logarithmic depth contains exactly the same information as the depth itself. It is easy to obtain the following lower bounds from the previous results about m depth . Corollary 15. OP unary (Elk, m log(depth) ) is P-hard and OP unary (Eli, m log(depth) ) is ExpTime-hard.
Proof. Regardless of the chosen deriver D, OP binary (D, m depth ) can be LogSpacereduced to OP unary (D, m log(depth) ), because in order to find a proof of depth at most q (with q given in binary), one can equivalently look for a proof whose logarithmic depth is bounded by the value log q. The unary encoding of log q has the same size as the binary encoding of q and can be computed in LogSpace by flipping all bits of the binary encoding of q to 1.

⊓ ⊔
We now return to m depth and cover the remaining case of exponential derivers and unary encoding of the upper bound q. Theorem 16. OP unary (D, m depth ) is in PSpace for any exponential deriver D. It is PSpace-hard for the exponential deriver D = Eli.
Proof. For the upper bound, we employ a depth-first guessing strategy: we guess a proof of depth at most q, where at each time point we only keep one branch of the proof in memory. As the length of this branch is bounded by q, and due to our assumptions on derivers, this procedure only requires polynomial space.
For the lower bound, we provide a reduction from the PSpace-complete QBF problem (satisfiability of quantified Boolean formulas). Let Q 1 x 1 Q 2 x 2 . . . Q m x m .φ be a quantified Boolean formula, where for i ∈ {1, . . . , m}, Q i ∈ {∃, ∀}, and φ is a formula over {x 1 , . . . , x m }. We assume φ to be in negation normal form, that is, negation only occurs directly in front of a variable. We construct an ELI theory T and a number q, both of size polynomial in the size of the formula, such that T |= A ⊑ B holds (cf. Definition 6) and T has a proof for A ⊑ B of depth q iff the QBF formula is valid. We use two roles r 1 , r 2 to deal with the variable valuations, concept names A 0 , . . ., A m to count the quantifier nesting, and a concept name A ψ for every sub-formula ψ of φ. In addition, we use the concept names A and B occurring in the conclusion, and two concept names B 1 and B 2 .
The concept name A initializes the formula at quantifier nesting level 0: For every i ∈ {1, . . . , m}, T contains the following sentence to select a truth valuation for x i , increasing the nesting depth in each step.
To ensure truth valuations are kept along the role-successors, we use the following sentences for every l ∈ {x i , ¬x i | 1 ≤ i ≤ m}: The following GCIs are now used to evaluate φ. For every conjunction ψ = ψ 1 ∧ψ 2 occurring in φ, we use: and for every disjunction ψ = ψ 1 ∨ ψ 2 , we use: Finally, the following GCIs are used to propagate the result of the evaluation back towards the start.
One can now show that there exists a proof for A ⊑ B from T of depth at most q iff the QBF formula is valid, where q is polynomial and determined by the size and structure of φ. Finally, we can extend T with the sentences from Proposition 13 to ensure that T |= A ⊑ B holds while retaining this equivalence.

The Tree Size Measure
The tree size measure was discussed already in [2], where tight bounds were provided for polynomial derivers and exponential derivers with unary encoding. Proof (sketch). We describe a non-deterministic procedure for OP binary (D, m tree ), in polynomial space. Let T be a theory, η the goal sentence, and q a rational number in binary encoding. By Lemma 7, it suffices to find a proof P for T |= η in D(T , η) with m tree (P) ≤ q. The procedure guesses such a proof starting from the conclusion, while keeping in memory a set S of tuples (η ′ , q ′ ), where η ′ is a sentence and q ′ ≤ q a rational number. Intuitively, such a tuple states: "We still need to guess a proof for η ′ of tree size at most q ′ ." 1. Initialize S := {(η, q)}. 2. While S = ∅, (a) select from S a tuple (η ′ , q ′ ) such that for all tuples (η ′′ , q ′′ ) ∈ S it holds that q ′′ ≥ q ′ ; (b) guess a hyperedge ({v 1 , . . . , v m }, v ′ ) in D(T , η) (using the oracle access described in Section 2.2) and m numbers q 1 , . . ., q m , such that ℓ(v ′ ) = η ′ and q 1 + . . . + q m + 1 ≤ q ′ ; and (c) replace (η ′ , q ′ ) in S by the tuples (ℓ(v 1 ), q 1 ), . . ., (ℓ(v m ), q m ).
There is a proof for T |= η of tree size at most q iff every step in the algorithm is successful. To show that it only requires polynomial space, we show that during the computation, the number of elements in S is always polynomially bounded. For this, we show that the elements in S can always be organized into a tree with the following properties: S1 the root is labeled with ǫ, S2 every other node is labeled with a distinct element from S, S3 every node that is not the root or a leaf has at least 2 children, S4 every node has at most p children, where p is the maximal number of premises in any inference in D(T , η), which we assumed to be polynomial in the input, S5 every node (η ′ , q ′ ) has at most 1 child (η ′′ , q ′′ ) that is not a leaf and for this child it holds that q ′′ < q ′ 2 , S6 for every node labeled (η ′ , q ′ ) with children labeled (η 1 , q 1 ), . . ., (η m , q m ), we have q 1 + . . . + q m < q ′ .
We prove this by induction on the steps of the algorithm, where in each step, we either replace one tuple in the tree, or put the new tuples under the leaf with the currently smallest value (see Fig.5). By S3 and because every number in S is bounded by q, we can show that the tree has depth at most log 2 q, which with S4 and S5 implies that it has at most p · log 2 q nodes. S2 then implies that that |S| ≤ p · log 2 q is always satisfied, and thus that S is polynomially bounded. ⊓ ⊔ A corresponding lower bound can be found for the exponential deriver Eli by a reduction of the word problem for deterministic Turing machines with polynomial space bound.

Theorem 18. For the exponential deriver
Proof (sketch). Let T = (Q, Γ, ✁ b, Σ, δ, q 0 , F ) be a deterministic Turing machine, where Q is the set of states, Γ the tape alphabet, ✁ b ∈ Γ the blank symbol, Σ ⊆ Γ the input alphabet, δ : Q × Γ → Q × Γ × {−1, 0, +1} the partial transition function, q 0 the initial state, and F ⊆ Q the accepting states. We assume that T is polynomially space bounded, that is, there is a polynomial p such that on input words w ∈ Σ * , T only accesses the first p(|w|) cells of the tape. For a word w, we denote by w[i] its ith letter. For some fixed word w, we construct a theory T using the following names, where k = p(|w|): -Start marks the inital and Accept an accepting configuration; -to denote that we are in state q ∈ Q, we use a concept name S q ; -for every a ∈ Γ and i ∈ {0, . . . , k}, we use a concept name A a i denoting that the letter a is on tape position i; -for every i ∈ {0, . . . , k}, we use the concept name P + i to denote that the head is currently on position i, and P − i to denote that it is not; -the role r is used to express the transitions between the configurations.
For convenience, we present the theory not in the required normal form, but aggregate conjunctions on the right. The following sentence describes the initial configuration.
The transition from one configuration to the next is encoded with the following sentences for every i ∈ {0, . . . , k} and every (q, a) ∈ Q×Γ with δ(q, a) = (q ′ , b, d): Finally, we use the following sentences to detect accepting configurations and propagate the information of acceptance back to the initial configuration Accept ⊑ ∀r − .Accept (13) One can find a number q exponential in k and the size of T s.t. that there is a proof for T |= Start ⊑ Accept with tree size at most q iff T accepts w. Using Proposition 13, we can extend T to a theory T ′ s.t. T ′ |= Start ⊑ Accept, while a proof of tree size q exists iff T accepts w (observe that m tree (P) ≥ m depth (P) holds for all proofs P). ⊓ ⊔

Conclusion
We have investigated the complexity of finding optimal proofs w.r.t. quality measures that satisfy the property of being monotone recursive. Two important examples of this class of measures, depth and tree size, have been considered in detail in combination with exponential and polynomial derivers. The obtained results are promising: given a deriver, the search for an optimal proof for an entailment can be easier than producing all of the proofs by this deriver. The algorithms used to show the upper bounds can serve as building blocks for finding an optimal proof w.r.t. to a monotone recursive measure automatically. We conjecture that weighted versions of tree size and depth, where sentences or inference steps can have associated rational weights, are also monotone recursive, and the generic upper bounds established in this paper can be straightforwardly applied to them. However, a more thorough study is required here, since the complexity of the decision problem depends on the exact way in which the weights are employed. This step towards weighted measures is motivated by user studies [1,15,24], demonstrating that different types of sentences and logical inferences can be more or less difficult to understand. A vertex v ∈ V is called a leaf if it has no incoming hyperedges, i.e. there is no (S, v) ∈ E; and v is a sink if it has no outgoing hyperedges, i.e. there is no (S, d) ∈ E such that v ∈ S. We denote the set of all leaves and the set of all sinks in H as leaf (H) and sink(H), respectively.
In this case, we also say that H contains H ′ and write H ′ ⊆ H. Given two hypergraphs H 1 = (V 1 , E 1 , ℓ 1 ) and Definition 20 (Cycle, Tree). Given a hypergraph H = (V, E, ℓ) and s, t ∈ V , a path P of length q ≥ 0 in H from s to t is a sequence of vertices and hyperedges where d 0 = s, d q = t, and d j−1 ∈ S j for all j, 1 ≤ j ≤ q. By |P | we denote the length of a path P . If there is such a path of length q > 0 in H, we say that t is reachable from s in H. If t = s, then P is called a cycle. The hypergraph H is acyclic if it does not contain a cycle. The hypergraph H is connected if every vertex is connected to every other vertex by a series of paths and reverse paths.
A hypergraph H = (V, E, ℓ) is called a tree with root t ∈ V if t is reachable from every vertex v ∈ V \ {t} by exactly one path. In particular, the root is the only sink in a tree, and all trees are acyclic and connected.

Definition 21 (Homomorphism). Let
Such an h is an isomorphism if it is a bijection, and its inverse, h − : H ′ → H, is also a homomorphism.

Definition 22 (Hypergraph Unraveling). The unraveling of an acyclic hy-
where V T consists of v as well as all paths in H that end in v, E T contains all hyperedges ({P 1 , . . . , P n }, P ) (resp. ({P 1 , . . . , P n }, v)) where each P i is of the form (d i , (S, d)) · P (resp. (d i , (S, v), v)) such that S = {d 1 , . . . , d n }, ℓ T (v) = ℓ(v) and ℓ T (P ) is the label of the starting vertex of P in H.
Moreover, the mapping h T : V T → V that maps each path to its starting vertex and v to itself is a homomorphism from H to H T .
The tree in Figure 3 represents the unraveling of the hypergraph from Figure 2. Proof. The first statement trivially follows from the acyclicity and the only sink v η in P. The length of a path in P can be bounded by |V |.
The second claim can be shown by an induction on the depth of P. Namely, for every k and every w ∈ V s.t. all paths leading to w have length at most k it holds that T |= ℓ(w). The induction base follows from the fact that the leaves are labeled with the sentences from T . For the induction step, for a vertex w ∈ V , we consider an hyperedge (S, w) ∈ E. Every s ∈ S satisfies the induction hypothesis and, thus, T |= ℓ(s). By P being a derivation structure, it holds {ℓ(s)|s ∈ S} |= ℓ(w) and, by transitivity of model-based entailment, T |= ℓ(w).

⊓ ⊔
We now show that Definition 5 is more general than the similar definition of Ψ -measures in [3], and in particular now also covers the measure depth.

Definition 23 ([3]).
A measure m is a Ψ -measure if, for every P ∈ P L , [P] m(P) is computable in polynomial time in the size of P, [SI] every subproof of a homomorphic image of P weighs no more than P, i.e. m(P ′′ ) ≤ m(P) for any homomorphism h : P → P ′ and P ′′ ⊆ h(P) such that P ′′ ∈ P L .

Lemma 24.
For proofs according to Definition 2, every Ψ-measure (as introduced in [3]) is a Φ-measure, but not vice versa.
Proof. Trivially follows since [HI] requires that only minimal subproofs of the homomorphic image weighs no more than P. However, in contrast to [3], in this paper we require every vertex in a proof in Definition 2 to have at most one incoming edge. Thus, rigorously speaking, measures in this paper may be undefined for some proof hypergraphs from the paper [3]. Moreover, depth is a Φ-measure (see Lemma 10) but not a Ψ -measure (see Lemma 8 in [3]).
⊓ ⊔ For the following proof, we define Intuitively, we obtain P −v by removing the proof of v, i.e. P v , from P. Therefore, for every w ∈ V v where w = v, and every (S, d) since v is now a leaf, but ℓ(v) may not be a sentence from T . Proof. Let P be such a proof with associated homomorphism h : P → D(T , η).
First, we show that there is a subproof for T |= η in the homomorphic image. If h(P) is acyclic and every vertex has at most one incoming edge, then we already found one subproof. Since P has a unique sink v η , it must be mapped to a unique sink h(v η ) in h(P), and thus h(P) is the desired subproof of D(T , η).
If h(P) is not acyclic or there is a vertex with more than one incoming edge, our goal is to find another admissible proof P * w.r.t. D(T , η) that uses a subset of the vertices of P such that h(P * ) ⊆ h(P) is acyclic with only one incoming edge for any vertex. For this purpose, first consider an arbitrary cycle in h(P), which must be due to two vertices v, v ′ in P such that h(v) = h(v ′ ) and there is a path between v and v ′ (or due to multiple such pairs of vertices). Since P is acyclic, we can assume that there is a path from v to v ′ , but no path from v ′ to v. We now consider the two subproofs P v and P v ′ . As there is a path from v to v ′ , we have P v ⊂ P v ′ . Since h(v) = h(v ′ ), both vertices are labeled with the same sentence. The idea of the following construction is to remove P v ′ from P and replace it with P v , which effectively removes all paths from v to v ′ .
More formally, we first consider the hypergraph H = P −v ′ ∪ P v and then, in the hyperedges (S, d) in H that still contain v ′ ∈ S, we replace v ′ by v, effectively merging the two vertices, remove v ′ from the set of vertices, and thus obtain a hypergraph P ′ . If there was no such hyperedge, then v ′ was the sink of P, i.e. ℓ(v ′ ) = η, and v will now be the new sink in For a vertex w in h(P) with more than one incoming edge, again there must be two vertices We can thus apply the same procedure as above. However, since P v ⊂ P v ′ may not hold, it does not matter which of the subproofs P v , P v ′ is replaced by the other.
We now show that P ′ is also an admissible proof w.r.t. D(T , η). Our construction does not produce new leaves, and hence P ′ is still grounded. Clearly, all remaining edges are sound since they were already sound in P. Moreover, P ′ is acyclic and every vertex has only one incoming edge since all edges and cycles in P ′ can be traced back to paths in P that involve both v and v ′ ; but we have assumed that there are no paths from v ′ to v, and have destroyed all paths from v to v ′ . As argued above, we have also kept the property that there is exactly one sink, which is labeled with η. Observe that h is also a homomorphism from P ′ to D(T , η) (when restricted to the vertices of P ′ ), because h(v) = h(v ′ ), and moreover h(P ′ ) ⊆ h(P).
This means that, after finitely many such operations, we can obtain from P the desired proof P * such that h(P * ) ⊆ h(P) is acyclic with every vertex having at most one incoming edge. Since h(P * ) also has a unique sink labeled by η, it is a subproof with sink h(v η ) in D(T , η).
Second, consider the set Q of all possible subproofs with sink h(v η ) in D(T , η). As shown above, Q is non-empty. Thus, if the proof h(P * ) obtained in the previous step is minimal w.r.t. m, then m(h(P * )) ≤ m(P) ≤ q by [HI] in Definition 5. Otherwise, there is another subproof Q ∈ Q with sink h(v η ) in D(T , η), s.t. m(Q) < m(h(P * )) and its weight is minimal w.r.t. m among Q. Then, again, m(Q) ≤ m(P) ≤ q by [HI] in Definition 5.
⊓ ⊔ An interesting property of recursive measures is that they are invariant under unraveling (see Definition 22).

Lemma 25.
Let m be a recursive Φ-measure, P be a proof for T |= η and P T its unraveling into a tree (starting at the sink). Then m(P) = m(P T ).
Proof. We show this by induction on the depth of P. If P contains only one vertex, then P T = P, and thus the claim is trivial. If the longest path in P has length n, assume that the claim holds for all proofs of depth at most n − 1. Consider the unique hyperedge (S, v) ∈ P that leads to the sink v of P. Then P T is isomorphic to the union of the unravelings P w,T of all P w with w ∈ S, together with the hypergraph that contains only the edge (S, v) (if we identify the paths (w, (S, v), v) with the vertices w). By induction, m(P w ) = m(P w,T ) since each P w is of depth at most n − 1. We obtain that Proof. Both measures are monotone recursive by definition and can be computed in polynomial time in the size of the input proof.
For tree size, we consider a homomorphism h : P → H and a vertex w in P. By the procedure described in the proof of Lemma 7, there exists a proof P * w with edges from P (modulo renamed vertices) s.t. h(P * w ) ⊆ h(P) is a proof with sink h(w). It is not hard to see that, by construction, m tree (P * w ) ≤ m tree (P), since we replace subproofs P v ′ with P v , where P v ⊂ P v ′ and m tree (P v ) < m tree (P v ′ ). Since h(P * w ) is a proof, m tree (h(P * w )) is defined. The property of any proof vertex having only one incoming edge guarantees that h(P * w ) and P * w are isomorphic. Since homomorphisms preserve edges, m tree (P * w ) = m tree (h(P * w )). Thus, for every vertex w in P, there is a proof with sink h(w) in h(P) of tree size no greater than m tree (P). Trivially, every minimal (w.r.t. m tree ) subproof with sink h(w) weighs no more than m tree (h(P * w )). Every vertex in h(P) has a pre-image in P and, therefore, [HI] holds for tree size.
For depth, we can use a similar argument. The process of replacing subproofs P v ′ with their smaller alternatives P v ⊂ P v ′ also results in a non-increasing depth for P * w . As a consequence, m depth (h(P * w )) ≤ m depth (P) for h(P * w ) ⊆ h(P). ⊓ ⊔ Lemma 11. For any monotone recursive Φ-measure m and deriver D, Algorithm 1 computes an optimal proof in time polynomial in the size of D(T , η).
Proof. We can show the following facts about this algorithm.
(I) Whenever P(v) is defined, then it is a proof for ℓ(v) contained in D(T , η).
We prove this by induction on the order in which the hypergraphs P(v) are constructed by Algorithm 1. The ones in Line 5 consist of a single leaf v, which is labeled by a theory sentence, and hence are sound, grounded, acyclic, and have the single sink v. Similarly, P(v) in Line 7 is always a proof since it consists of a single edge from D(T , η), has no leaves, and has v as the only sink. Consider now the hypergraph P constructed in Line 16 as a possible candidate for P(v) (where v = d). At this point, all P(s), s ∈ S, are already defined since the counter k(e) can only reach |S| if each s ∈ S has already been chosen in Line 11, and thus P(s) must have been defined. Hence, by induction, each P(s) is a proof for ℓ(s) contained in D(T , η), and, because we assume that D(T , η) contains no two vertices with the same label, must have s as sink. This shows that the hypergraph P constructed in Line 16 is sound, grounded, and has a single sink, namely v. Finally, P(v) is only updated to P in Line 19 if P is acyclic and therefore it is a proof. (II) If vertex v is chosen before vertex w in Line 11, then m(P(v)) ≤ m(P(w)).
We show that after choosing v in Line 11 the algorithm cannot produce a new proof P in Line 16 with m(P) < m(P(v)), and thus the smallest weight min{m(P(w)) | w ∈ Q} can never decrease. Consider the proof P from Line 17. Since v ∈ S, we have P(v) ⊂ P, and therefore m(P(v)) ≤ m(P) by [HI]. (III) Algorithm 1 terminates in polynomial time.
Item (II) implies that each vertex v ∈ V can be removed from Q at most once: in order for v to be added again to Q in Line 19, there would need to exist a proof other than P(v) with the same sink v but a smaller weight, but according to (II), after choosing v in Line 11, the algorithm does not construct any proofs with a weight smaller than m(P(v)) (for any sink). Therefore, during the complete run of the algorithm, each edge e = (S, d) ∈ E will be used at most once in Line 16. Moreover, all primitive operations in the algorithm can be done in polynomial time, such as checking acyclicity of hypergraphs in Line 17 or finding the minimal value m(P(v)) in Line 11. It follows that Algorithm 1 terminates in time polynomial in the size of D(T , η).
(IV) Every vertex v ∈ V that is the sink of a proof P contained in D(T , η) is added to Q at some point.
We prove this by induction on the structure of P. If P contains only v, then either ℓ(v) ∈ T , and hence v is added to Q in Line 5, or otherwise there is an edge (∅, v) in P (and hence in E), in which case v is added to Q in Line 7. If P has more than one vertex, then it must contain at least one edge e = (S, v) ∈ E, where each s ∈ S is the sink of a subproof of P in D(T , η). By induction we know that each s ∈ S is added to Q at some point during the algorithm. By Item (III), they must also be removed from Q at some point afterwards, and hence eventually k(e) reaches |S| in Line 15. If P(v) was already defined at this point, then v had already been added to Q earlier. Otherwise, v is now added to Q in Line 19. (V) When Algorithm 1 terminates and P(v) is defined, then m(P(v)) is minimal among all proofs for ℓ(v) contained in D(T , η).
By (I), P(v) is a proof of this form. Assume to the contrary that there is a proof P for ℓ(v) contained in D(T , η) such that m(P) < m(P(v)).
Then P and P(v) must both have the sink v, because we assume that D(T , η) contains no two vertices with the same label. Assume moreover that i) P is an optimal proof for ℓ(v) in D(T , η), that is, m(P) ≤ m(P ′ ) for every other proof P ′ for ℓ(v) in D(T , η) (cf. Corollary 8), and ii) among all other vertices v ′ ∈ V and all proofs P ′ for ℓ(v ′ ) in D(T , η) such that m(P ′ ) < m(P(v ′ )), we also have m(P) ≤ m(P ′ ) and whenever m(P) = m(P ′ ), then |P| ≤ |P ′ |. Consider the unique last inference step (S, v) in P. We show that, for every vertex w ∈ S, an optimal proof was assigned to P(w) before v was chosen in Line 11. First, P w is a proof of ℓ(w) contained in D(T , η), and hence by (IV) P(w) must be defined when Algorithm 1 terminates. Next we show that m(P(w)) ≤ m(P w ). Assume to the contrary that m(P w ) < m(P(w)), i.e. w and P w satisfy the precondition in Assumption ii), and thus m(P) ≤ m(P w ) must hold. However, by [HI], we have m(P w ) ≤ m(P) (observe that m(P w ) must be minimal among the subproofs of P w with sink w since otherwise m(P) would not be minimal, because m is monotone recursive). Thus, Assumption ii) also yields that |P| ≤ |P w |, which contradicts the fact that P w is a subproof of P. We obtain that m(P(w)) ≤ m(P w ) ≤ m(P) < m(P(v)), We first note that T has at most |Q| · |Γ | k different configurations, and thus an accepting run involves m ≤ |Q| · |Γ | k steps. Let Conf 0 , . . ., Conf m denote the sequence of conjunctions representing those configurations, where Conf 0 represents the initial configuration and Conf m the final one. For j ∈ {0, . . . , m − 1}, the transitions from one configuration to the succeeding is encapsulated in an entailment of the form Conf j ⊑ ∃r.Conf j+1 . Those entailments are inferred as follows with the calculus: -(2 + k + 1) + k = 2k + 3 times we have to apply CR1 and then CR2 to obtain sentences as in (10) and (11) but with the entire configuration encoding Conf j on the left-hand side. For each of the 2+k+1 sentences corresponding to (10), this gives a tree of size 5 (3 times we apply CR1 without a premise to get the sentences Conf j ⊑ A, where A is an atom on the left-hand side of (10), followed by one application of CR2 with a sentence corresponding to (10)). For the k sentences in (11), this requires a tree of size 4 (the same argument as before, but now with one atom less on the left-hand side of (11)). -The resulting 2k + 3 sentences are then step-wise combined using CR4 to obtain the desired entailment with Conf j+1 under the role restriction. This generates 2k + 1 intermediate conclusions. Together with the final conclusion Conf j ⊑ ∃r.Conf j+1 , this makes 2k + 2 additional tree vertices in total.
Consequently, each inference of Conf j ⊑ ∃r.Conf j+1 is generated by a proof of tree size (2 + k + 1) · 5 + k · 4 + (2k + 2) = 11k + 17. The complete tree proof for Start ⊑ Accept is now obtained by first generating Conf m ⊑ Accept for the final configuration. This involves 3 vertices, the first by using CR1 to generate Conf m ⊑ S f , where f is the accepting state of the configuration, and then using CR2 with Sentence (12) to generate Conf m ⊑ Accept. From here, we follow the sequence of configurations backwards, each time inferring from Conf j ⊑ Accept and (13) the sentence Conf j ⊑ ∀r − .Accept (increasing the tree size by 2), and then using this sentence together with Conf j−1 ⊑ ∃r.Conf j via CR3 to get Conf j−1 ⊑ Accept (tree size increased by 11k + 17). Finally, from Conf 0 ⊑ Accept we get to Start ⊑ Accept using the 2k+3 sentences corresponding to (9) and CR2. We obtain that the entire tree proof requires at most 1 + (2k + 3) + m · (3 + 2 + (11k + 17)) ≤ (m + 1) · (11k + 22) ≤ (|Q| · |Γ | p(|w|) + 1) · (17 + 11p(|w|)) vertices. Note that this number can be encoded using polynomially many bits.