Learning from {\L}ukasiewicz and Meredith: Investigations into Proof Structures (Extended Version)

The material presented in this paper contributes to establishing a basis deemed essential for substantial progress in Automated Deduction. It identifies and studies global features in selected problems and their proofs which offer the potential of guiding proof search in a more direct way. The studied problems are of the wide-spread form of"axiom(s) and rule(s) imply goal(s)". The features include the well-known concept of lemmas. For their elaboration both human and automated proofs of selected theorems are taken into a close comparative consideration. The study at the same time accounts for a coherent and comprehensive formal reconstruction of historical work by {\L}ukasiewicz, Meredith and others. First experiments resulting from the study indicate novel ways of lemma generation to supplement automated first-order provers of various families, strengthening in particular their ability to find short proofs.


Introduction
Research in Automated Deduction, also known as Automated Theorem Proving (ATP), has resulted in systems with a remarkable performance. Yet, deep mathematical theorems or otherwise complex statements still withstand any of the systems' attempts to find a proof. The present paper is motivated by the thesis that the reason for the failure in more complex problems lies in the local orientedness of all our current methods for proof search like resolution or connection calculi in use.
In order to find out more global features for directing proof search we start out here to study the structures of proofs for complex formulas in some detail and compare human proofs with those generated by systems. Complex formulas of this kind have been considered by Łukasiewicz in [21]. They are complex in the sense that current systems require tens of thousands or even millions of search steps for finding a proof if any, although the length of the formulas is very short indeed. How come that Łukasiewicz found proofs for those formulas although he could never carry out more than, say, a few hundred search steps by hand? Which global strategies guided him in finding those proofs? Could we discover such strategies from the formulas' global features?
By studying the proofs in detail we hope to come closer to answers to those questions. Thus it is proofs, rather than just formulas or clauses as usually in ATP, which is in the focus of our study. In a sense we are aiming at an ATPoriented part of Proof Theory, a discipline usually pursued in Logic yet under quite different aspects. This meta-level perspective has rarely been taken in ATP for which reason we cannot rely on the existing conceptual basis of ATP but have to build an extensive conceptual basis for such a study more or less from scratch.
This investigation thus analyzes structures of, and operations on, proofs for formulas of the form "axiom(s) and rule(s) imply goal(s)". It renders condensed detachment, a logical rule historically introduced in the course of studying these complex proofs, as a restricted form of the Connection Method (CM) in ATP. All this is pursued with the goal of enhancing proof search in ATP in mind. As noted, our investigations are guided by a close inspection into proofs by Łukasiewicz and Meredith. In fact, the work presented here amounts at the same time to a very detailed reconstruction of those historical proofs.
The rest of the paper is organized as follows: In Sect. 2 we introduce the problem and a formal human proof that guides our investigations and compare different views on proof structures. We then reconstruct in Sect. 3 the historical method of condensed detachment in a novel way as a restricted variation of the CM where proof structures are represented as terms. This is followed in Sect. 4 by results on reducing the size of such proof terms for application in proof shortening and restricting the proof search space. Section 5 presents a detailed feature table for the investigated human proof, and Sect. 6 shows first experiments where the features and new techniques are used to supplement the inputs of ATP systems with lemmas. Section 7 concludes the paper. Supplementary technical material including proofs is provided in Appendix A. Data and tools to reproduce the experiments are available at http://cs.christophwernhard.com/cd.

Relating Formal Human Proofs with ATP Proofs
In 1948 Jan Łukasiewicz published a formal proof of the completeness of his shortest single axiom for the implicational fragment (IF), that is, classical propositional logic with implication as the only logic operator [21]. In his notation the implication p → q is written as Cpq. Following Frank Pfenning [31] we formalize IF on the meta-level in the first-order setting of modern ATP with a single unary predicate P to be interpreted as something like "provable" and represent the propositional formulas by terms using the binary function symbol i for implication. We will be concerned with the following formulas.
IF can be axiomatized by the set of the three axioms Simp, Peirce and Syll , known as Tarski-Bernays Axioms. Alfred Tarski in 1925 raised the problem to characterize IF by a single axiom and solved it with very long axioms, which led to a search for the shortest single axiom, which was found with the axiom nicknamed after him in 1936 by Łukasiewicz [21]. In 1948 he published his derivation that Łukasiewicz entails the three Tarski-Bernays Axioms, expressed formally by the method of substitution and detachment. Detachment is also familiar as modus Pi(i(ipq, r), i(irp, isp)) ∧ (Px ∧ Pixy → Py) → Pi(ipq, i(iqr, ipr)) ponens. Łukasiewicz's proof involves 34 applications of detachment. Among the Tarski-Bernays axioms Syll is by far the most challenging to prove, hence his proof centers around the proof of Syll , with Peirce and Simp spinning off as side results. Carew A. Meredith presented in [26] a "very slight abridgement" of Łukasiewicz's proof, expressed in his framework of condensed detachment [32], where the performed substitutions are no longer explicitly presented but implicitly assumed through unification. Meredith's proof involves only 33 applications of detachment. In our first-order setting, detachment can be modeled with the following meta-level axiom.
Det def = ∀xy (Px ∧ Pixy → Py). In Det the atom Px is called the minor premise, Pixy the major premise, and Py the conclusion. Let us now focus on the following particular formula.
ŁDS def = Łukasiewicz ∧ Det → Syll . "Problem ŁDS " is then the problem of determining the validity of the first order formula ŁDS . In view of the CM [1,2,3], a formula is valid if there is a spanning and complementary set of connections in it. In Fig. 1 ŁDS is presented again, nicknames dereferenced and quantifiers omitted as usual in ATP, with the five unifiable connections in it. Observe that p, q, r, s on the left side of the main implication are variables, while p, q, r on the right side are Skolem constants. Any CM proof of ŁDS consists of a number of instances of the five shown connections. Meredith's proof, for example, corresponds to 491 instances of Det, each linked with three instances of its five incident connections. Figure 2 compares different representations of a short formal proof with the Det meta axiom. There is a single axiom, Syll Simp, and the theorem is ∀pqrstu Pi(p, i(q, i(r, i(s, i(t, ius))))). Figure 2a shows the structure of a CM proof. It involves seven instances of Det, shown in columns D 1 , . . . , D 7 . The major premise Pix i y i is displayed there on top of the minor premise Px i , and the (negated) conclusion ¬Py i , where x i , y i are variables. Instances of the axiom appear as literals ¬Pa i , with a i a shorthand for the term i(i(ip i q i , r i ), iq i r i ). The rightmost literal Pg is a shorthand for the Skolemized theorem. The clause instances are linked through edges representing connection instances. The edge labels identify the respective connections as in Fig. 1. An actual connection proof is obtained by supplementing this structure with a substitution under which all pairs of literals related through a connection instance become complementary. Figure 2b represents the tree implicit in the CM proof. Its inner nodes correspond to the instances of Det, and its leaf nodes to the instances of the axiom. Edges appear ordered to the effect that those originating in a major premise of Det are directed to the left and those from a minor premise to the right. The goal clause Pg is dropped. The resulting tree is a full binary tree, i.e., a binary  (c) 1. CCCpqrCqr tree where each node has 0 or 2 children. We observe that the ordering of the children makes the connection labeling redundant as it directly corresponds to the tree structure. Figure 2c presents the proof in Meredith's notation. Each line shows a formula, line 1 the axiom and lines 2-4 derived formulas, with proofs annotated in the last column. Proofs are written as terms in Polish notation with the binary function symbol D for detachment where the subproofs of the major and minor premise are supplied as first and second, resp., argument. Formula 4, for example, is obtained as conclusion of Det applied to formula 2 as major premise and as minor premise another formula that is not made explicit in the presentation, namely the conclusion of Det applied to formula 3 as both, major and minor, premises. An asterisk marks the goal theorem. Figure 2d is like Fig. 2b, but with a different labeling: Node labels now refer to the line in Fig. 2c that corresponds to the subproof rooted at the node. The blank node represents the mentioned subproof of the formula that is not made explicit in Fig. 2b. An inner node represents a condensed detachment step applied to the subproof of the major premise (left child) and minor premise (right child). Figure 2e shows a DAG (directed acyclic graph) representation of Figure 2d. It is the unique maximally factored DAG representation of the tree, i.e., it has no multiple occurrences of the same subtree. Each of the four proof line labels of Fig. 2c appears exactly once in the DAG. We conclude this introductory section with reproducing Meredith's refinement of Łukasiewicz's completeness proof in Fig. 3, taken from [26]. Since we will often refer to this proof, we call it MER. There is a single axiom (1), which is Łukasiewicz . The proven theorems are Syll (17), Peirce (18) and Simp (19). In addition to line numbers also the symbol n appears in some of the proof terms. Its meaning will be explained later on in the context of Def. 19. For now, we can read n just as "1". Dots are used in the Polish notation to disambiguate numeric identifiers with more than a single digit.

Condensed Detachment and a Formal Basis
Following [4], the idea of condensed detachment can be described as follows: Given premises F → G and H, we can conclude G , where G is the most general result that can be obtained by using a substitution instance H as minor premise with the substitution instance F → G as major premise in modus ponens. Condensed detachment was introduced by Meredith in the mid-1950s as an evolution of the earlier method of substitution and detachment, where the involved substitutions were explicitly given. The original presentations of condensed detachment are informal by means of examples [32,18,33,27], formal specifications have been given later [17,14,4]. In ATP, the rendering of condensed detachment by hyperresolution with the clausal form of axiom Det is so far the prevalent view. As overviewed in [25,37], many of the early successes of ATP were based on condensed detachment. Starting from the hyperresolution view, structural aspects of condensed detachment have been considered by Robert Veroff [40] with the use of term representations of proofs and linked resolution. Results of ATP systems on deriving the Tarski-Bernays axioms from Łukasiewicz are reported in [31,45,24,25,11]. Our goal in this section is to provide a formal framework that makes the achievements of condensed detachment accessible from a modern ATP view. In particular, the incorporation of unification, the interplay of nested structures with explicitly and implicitly associated formulas, sharing of structures through lemmas, and the availability of proof structures as terms. Notation. Most of our notation follows common practice [6] such that we only provide some reminding hints here: s ≥ · t expresses that t subsumes s and s ¤ t that t is a subterm of s. A position is a sequence of positive integers that specifies the occurrence of a subterm in a term as a path in Dewey decimal notation starting from the root of term. The set of all positions of a term s is denoted by Pos(s). For example, Pos(f(x, g(y))) = { , 1, 2, 2.1}. For p ∈ Pos(s), s| p denotes the subterm of s at position p, and s[t] p the term obtained from s by replacing the subterm occurrence at position p with term t.
In addition, we make use of a few special symbols and conventions: The set of positions p ∈ Pos(s) such that s| p is a variable or a constant is denoted by Leaf Pos(s) and the set of positions p ∈ Pos(s) such that s| p is a compound term by Inner Pos(s). We use the postfix notation for the application of a substitution σ also for sets M of pairs of terms: M σ stands for {{sσ, tσ} | {s, t} ∈ M }. For terms s, t, u, the expression s[t → u] denotes s after simultaneously replacing all occurrences of t with u. The height height(s) of a term s is, viewing the term as a tree, the number of edges of the longest downward path from the root to a leaf. In the literature it also called depth of the term. If F is a formula, then ∀F denotes the universal closure of F .

Proof Structures: D-Terms, Tree Size and Compacted Size
In this section we consider only the purely structural aspects of condensed detachment proofs. Emphasis is on a twofold view on the proof structure, as a tree and as a DAG (directed acyclic graph), which factorizes multiple occurrences of the same subtree. Both representation forms are useful: the compacted DAG form captures that lemmas can be repeatedly used in a proof, whereas the tree form facilitates to specify properties in an inductive manner. We call the tree representation of proofs by terms with the binary function symbol D D-terms. A finite tree and, more generally, a finite set of finite trees can be represented as DAG, where each node in the DAG corresponds to a subtree of a tree in the given set. It is well known that there is a unique minimal such DAG, which is maximally factored (it has no multiple occurrences of the same subtree) or, equivalently, is minimal with respect to the number of nodes, and, moreover, can be computed in linear time [7]. The number of nodes of the minimal DAG is the number of distinct subtrees of the members of the set of trees. There are two useful notions of measuring the size of a D-term, based directly on its tree representation and based on its minimal DAG, respectively. The tree size of a D-term can equivalently be characterized as the number of its inner nodes. The compacted size of a D-term is the number of its distinct compound subterms. It can equivalently be characterized as the number of the inner nodes of its minimal DAG. As an example consider the D-term d defined in formula (i), whose minimal DAG is shown in Fig. 2e. The tree size of d is t-size(d) = 7 and the compacted size of d is c-size(d) = 4, corresponding to the cardinality of the set {e ∈ D | d ¤ e} of compound subterms of d, i.e., ), d}.
The tabular presentation of proof MER (Fig. 3) renders its DAG structure as a mapping of the line numbers to the trees, i.e., D-terms, in the right column. 4 The DAG represents a set of three trees corresponding to proofs of Syll (line 17), Peirce (line 18) and Simp (line 19), respectively. The compacted size of the set of these three is 33, which can be determined by counting the occurrences of D in the right column. For the individual subproofs, the compacted size can be determined by counting the occurrences of D in only those lines that can be reached via the mapping from the respective root, and the tree size by counting the occurrences of D after unfolding the respective root according to the mapping. The proof of Syll in MER, for example, has compacted size 31 and tree size 491.
As will be explicated in more detail below, each occurrence of the function symbol D in a D-term corresponds to an instance of the meta-level axiom Det in the represented proof. Hence the tree size measures the number of instances of Det in the proof. Another view is that each occurrence of D in a D-term corresponds to a condensed detachment step, without re-using already proven lemmas. The compacted size of a D-term is the number of its distinct compound subterms, corresponding to the view that the size of the proof of a lemma is only counted once, even if it is used multiply in the proof. Tree size and compacted size of D-terms have been previously identified as relevant proof size measures in [40], called there CDcount and length, respectively.

Proof Structures, Formula Substitutions and Semantics
We use a notion of unifier that applies to a set of pairs of terms, as common in discussions based on the CM [1,9,8]. Although a unifier of a finite set of pairs {{s 1 , t 1 }, . . . , {s n , t n }} can be expressed as unifier of the single pair {f(s 1 , . . . , s n ), f(t 1 , . . . , t n )} of terms, the explicit definition for a set of pairs is useful because such pairs naturally arise in the CM and the related proof trees, D-terms, in condensed detachment. The additional properties required for clean most general unifiers do not hold for all most general unifiers. 5 However, the unification algorithms known from the literature produce clean most general unifiers [9,Remark 4.2]. If a set of pairs of terms has a unifier, then it has a most general unifier and, moreover, also a clean most general unifier. Since we define mgu(M ) as a clean most general unifier, we are permitted to make use of the assumption that it is idempotent and that all variables occurring in its domain and range occur in M . Convention 4.ii has the purpose to reduce clutter in proposition, lemma and theorem statements.
The structural aspects of condensed detachment proofs represented by D-terms, i.e., full binary trees, will now be supplemented with associated formulas. Condensed detachment proofs, similar to CM proofs, involve different instances of the input formulas (viewed as quantifier free, e.g., clauses), which may be considered as obtained in two steps: first, "copies", that is, variants with fresh variables, of the input formulas are created; second a substitution is applied to these copies. Let us consider now the first step. The framework of D-terms permits to give the variables in the copies canonical designators with an index subscript that identifies the position in the structure, i.e., in the D-term, or tree. Recall that positions are path specifiers. For a given D-term d and leaf position p of d the variables x i p are for use in a formula associated with p which is the copy of an axiom. Different variables in the copy are distinguished by the upper index i. If p is a non-leaf position of d, then y p denotes the variable in the conclusion of the copy of Det that is represented by p. In addition, y p for leaf positions p may occur in the antecedents of the copies of Det. The following equivalences, which hold for all positions p, justify this coupling of positions and the variables x i p , y p for Łukasiewicz as an example of an application axiom and for Det.
(iii) Here the major premise of Det appears to the left of the minor one, matching the argument order of the D function symbol. The following substitution shift p is a tool to systematically rename position-associated variables while preserving the internal relationships between the index-referenced positions.
Definition 6. For all positions p define the substitution shift p as follows: and q is a position}. The application of shift p to a term s effects that p is prepended to the position indexes of all the position-associated variables occurring in s. The association of axioms with primitive D-terms is represented by mappings which we call axiom assignments, defined as follows.
Definition 7. An axiom assignment α is a mapping whose domain is a set of primitive D-terms and whose range is a set of terms whose variables are in We define a shorthand for a form of Łukasiewicz that is suitable for use as a range element of axiom assignments. It is parameterized with a position p.
. In Meredith's proof presentation the axiom assignment is represented by the steps with no trailing D-term, such as lines 1 in Fig. 2c and 3. The second step of obtaining the instances involved in a proof can be performed by applying the most general unifier of a pair of terms that constrain it. The tree structure of D-terms permits to associate exactly one such pair with each term position. Inner positions represent detachment steps and leaf positions instances of an axiom according to a given axiom assignment. The following definition specifies these constraining pairs.
A unifier of the set of pairings of all positions of a D-term d equates for a leaf position p the variable y p with the value of the axiom assignment α for the primitive D-term at p, after "shifting" variables by p. This "shifting" means that the position subscript of the variables in the axiom argument term α(d| p ) is replaced by p, yielding a dedicated copy of the axiom argument term for the leaf position p. For inner positions p the unifier equates y p.1 and i(y p.2 , y p ), reflecting that the major premise of Det is proven by the left child of p. The substitution induced by the pairings associated with the positions of a D-term allow to associate a specific formula with each position of the D-term, called the in-place theorem (IPT). The case where the position is the top position is distinguished as most general theorem (MGT).
Definition 9. For D-terms d, positions p ∈ Pos(d) and axiom assignments α for d define the in-place theorem (IPT) of d at p for α, Ipt α (d, p), and the most general theorem (MGT) of d for α, Since Ipt and Mgt are defined on the basis of mgu, they are undefined if the set of pairs of terms underlying the respective application of mgu is not unifiable. Hence, we apply the convention of Def. 4.ii for mgu also to occurrences of Ipt and Mgt. If Ipt and Mgt are defined, they both denote an atom whose variables are constrained by the clean property of the underlying application of mgu. The following proposition relates IPT and MGT with respect to subsumption.
Proposition 10. For all D-terms d, positions p ∈ Pos(d) and axiom assignments α for d it holds that Ipt α (d, p) ≥ · Mgt α (d| p ). By Prop. 10, the IPT at some position p of a D-term d is subsumed by the MGT of the subterm d| p of d rooted at position p. An intuitive argument is that the only constraints that determine the most general unifier underlying the MGT are induced by positions of d| p , that is, below p (including p itself). In contrast, the most general unifier underlying the IPT is determined by all positions of d.
The following lemma expresses the core relationships between a proof structure (a D-term), a proof substitution (accessed via the IPT) and semantic entailment of associated formulas.
Lemma 11. Let d be a D-term and let α be an axiom assignment for d. Then for all p ∈ Pos(d) it holds that: Based on this lemma, the following theorem shows how Detachment together with the axioms in an axiom assignment entail the MGT of a given D-term.
Theorem 12. Let d be a D-term and let α be an axiom assignment for d. Then Theorem 12 states that Det together with the axioms referenced in the proof, that is, the values of α for the leaf nodes of d considered as universally closed atoms, entail the universal closure of the MGT of d for α. The universal closure of the MGT is the formula exhibited in Meredith's proof notation in the lines with a trailing D-term, such as lines 2-19 in Fig. 3.

Reducing the Proof Size by Replacing Subproofs
The term view on proof trees suggests to shorten proofs by rewriting subterms, that is, replacing occurrences of subproofs by other ones, with three main aims: (1) To shorten given proofs, with respect to the tree size or the compacted size. (2) To investigate given proofs whether they can be shortened by certain rewritings or are closed under these. (3) To develop notions of redundancy for use in proof search. A proof fragment constructed during search may be rejected if it can be rewritten to a shorter one.
It is obvious that if a D-term d is obtained from a D-term d by replacing an occurrence of a subterm e with a D-term e such that t-size(e) ≥ t-size(e ), then also t-size(d) ≥ t-size(d ). Based on the following ordering relations on D-terms, which we call compaction orderings, an analogy for reducing the compacted size instead of the tree size can be stated.
The relations d ≥ c e and d > c e compare D-terms d and e with respect to the superset relationship of their sets of those strict subterms that are compound terms.
Theorem 14.i states that if d is the D-term obtained from d by simultaneously replacing all occurrences of a compound D-term e with a "c-smaller" D-term e , i.e., e ≥ c e , then the compacted size of d is less or equal to that of d. As stated with the supplementary Theorem 14.ii, the sc-size is a measure that strictly decreases under the strict precondition e > c e , which is useful to ensure termination of rewriting. The following proposition characterizes the number of D-terms that are smaller than a given D-term w.r.t the compaction ordering ≥ c .
By Prop. 15, for a given D-term d, the number of D-terms e that are smaller than d with respect to ≥ c is only quadratically larger than the compacted size of d and hence also than the tree size of d. Hence techniques that inspect all these smaller D-terms for a given D-term can efficiently be used in practice. According to Theorem 12, a condensed detachment proof, i.e., a D-term d and an axiom assignment α, proves the MGT of d for α along with instances of the MGT. In general, replacing subterms of d should yield a proof of at least these theorems. That is, a proof whose MGT subsumes the original one. Hence we are interested in identifying conditions that ensure that subterm replacement steps yield proofs with a MGT that subsumes the MGT before the replacement. The following theorems express such conditions. Theorem 16. Let d, e be D-terms, let α be an axiom assignment for d and for e, and let p 1 , . . . , p n , where n ≥ 0, be positions in Theorem 16 states that simultaneously replacing a number of occurrences of possibly different subterms in a D-term by the same subterm with the property that its MGT subsumes each of the IPTs of the original occurrences results in an overall D-term whose MGT subsumes that of the original overall D-term. The following theorem is similar, but restricted to the case of a single replaced subterm occurrence and with a stronger precondition. It follows from Theorem 16 and Prop. 10.
Simultaneous replacements of subterm occurrences are essential for reducing the compacted size of proofs according to Theorem 14. For replacements according to Theorem 17 they can be achieved by successive replacements of individual occurrences. In Theorem 16 simultaneous replacements are explicitly considered because the replacement of one occurrence according to this theorem can invalidate the preconditions for another occurrence. Theorem 17 can be useful in practice because the precondition Mgt α (d| p ) ≥ · Mgt α (e) can be evaluated on the basis of α, e and just the subterm d| p of d, whereas determining Ipt α (d, p) for Theorem 16 requires also consideration of the context of p in d. Based on Theorems 16 and 14 we define the following notions of reduction and regularity.
Definition 18. Let d be a D-term, let e be a subterm of d and let α be an axiom assignment for d. For D-terms e the D-term d[e → e ] is then obtained by C-reduction from d for α if e > c e , Mgt α (e ) is defined, and for all positions is obtained by C-reduction from d for α. Otherwise, d is called C-regular. If d is obtained from d by C-reduction, then by Theorem 16 and 14 it follows . Cregularity differs from well known concepts of regularity in clausal tableaux (see, e.g., [15]) in two respects: (1) In the comparison of two nodes on a branch (which is done by subsumption as in tableaux with universal variables) for the upper node the stronger instantiated IPT is taken and for the lower node the more weakly instantiated MGT.
(2) C-regularity is not based on relating two nested subproofs, but on comparison of all occurrences of a subproof with respect to all proofs that are smaller with respect to the compaction ordering.
Proofs may involve applications of Det where the conclusion Py is actually independent from the minor premise Px. Any axiom can then serve as a trivial minor premise. Meredith expresses this with the symbol n as second argument of the respective D-term. Our function simp-n simplifies D-terms by replacing subterms with n accordingly on the basis of the preservation of the MGT.
Definition 19. If d is a D-term and α is an axiom assignment for d, then the n-simplification of d with respect to α is the D-term simp-n α (d), where simp-n is the following function:

Properties of Meredith's Refined Proof
Our framework renders condensed detachment as a restricted form of the CM. This view permits to consider the expanded proof structures as binary trees or D-terms. On this basis we obtain a natural characterization of proof properties in various categories, which seem to be the key towards reducing the search space in ATP. Table 1 shows such properties for each of the 34 structurally different subproofs of proof MER (Fig. 3). Column M gives the number of the subproof in Fig. 3. We use the following short identifiers for the observed properties: Structural Properties of the D-Term. These properties refer to the respective subproof as D-term or full binary tree. DT, DC, DH: Tree size, compacted size, height. DK L , DK R : "Successive height", that is, the maximal number of successive edges going to the left (right, resp.) on any path from the root to a leaf. DP: Is "prime", that is, DT and DC are equal. DS: Relationship between the subproofs of major and minor premise. Identity is expressed with =, the subterm and superterm relationships with ¡ and £, resp., and the compaction ordering relationship (if none of the other relationships holds) with < c and > c . In addition it is indicated if a subproof is an axiom or n. DD: "Direct sharings", that is, the number of incoming edges in the DAG representation of the overall proof of all theorems. DR: "Repeats", that is, the total number of occurrences in the set of expanded trees of all roots of the DAG. DD DR TT TC TH TV TO RC  MT  MC ITU ITM IHU Fig. 3.

M DT DC DH DKL DKR DP DS
Properties of the MGT. These properties refer to the argument term of the MGT of the respective subproof. TT, TH: Tree size (defined as for D-terms) and height. TV: Number of different variables occurring in the term. TO: Is "organic" [23], that is, the argument term has no strict subterm s such that P(s) itself is a theorem. We call an atom weakly organic (indicated by a gray bullet) if it is not organic and the argument term is of the form i(p, t) where p is a variable that does not occur in the term t and P(t) is organic. For axiomatizations of fragments of propositional logic, organic can be checked by a SAT solver.  Table 1 Table 2. Proof dimensions of various proofs of problem ŁDS .

First Experiments
First experiments based on the framework developed in the previous sections are centered around the generation of lemmas where not just formulas but, in the form of D-terms, also proofs are taken into account. This leads in general to preference of small proofs and to narrowing down the search space by restricted structuring principles to build proofs. The experiments indicate novel calculi -for now at an early stage of development -which combine aspects from lemma-based generative, or bottom-up, methods such as hyperresolution and hypertableaux with structure-based approaches that are typically used in an analytic, or goal-directed, way such as the connection method. In addition, ways of using lemma generation as preprocessing for theorem proving, in particular to obtain short proofs, are suggested. These techniques resulted in a further refinement of Łukasiewicz's proof [21] of the completeness of his single axiom for the implicational fragment, whose compacted size is by one smaller than that of Meredith's refinement [26] and by two than Łukasiewicz's original proof. Table 2 shows compacted size DC, tree size DT and height DH of various proofs of ŁDS . Asterisks indicate that n-simplification was applied with reducing effect on the system's proof. Proof (1.) is the one by Łukasiewicz [21], translated into condensed detachment, proof (2.) is proof MER (Fig. 3) [26]. Rows (3.)-(5.) show results from Prover9 , where in (5.) the value of max_depth was limited to 7, motivated by column TH of Table 1. Proof (4.) illustrates the effect of nsimplification. 6 For proofs (6.)-(9.) additional axioms were supplied to Prover9 and CMProver [5,41,42], a goal-directed system that can be described by the CM. Columns indicate the lemma computation method, the number of lemmas supplied to the prover and the time used for lemma computation. Method PrimeCore adds the MGTs of subproof 18 from Table 1 and all its subproofs as lemmas. Subproof 18 is the largest subproof of proof MER that is prime and can be characterized on the basis of the axiom -almost uniquely -as a proof that is prime, whose MGT has no smaller prime proof and has the same number of different variables as the axiom, i.e., 4, and whose size, given as parameter, is 17. Method ProofSubproof is based on detachment steps with a D-term and a subterm of it as proofs of the premises, which, as column DS of Table 1 shows, suffices to justify all except of two proof steps in MER. It proceeds in some analogy to the given clause algorithm on lists of D-terms: If d is the given D-term, then the inferred D-terms are all D-terms that have a defined MGT and are of the form D(d, e) or D(e, d), where e is a subterm of d. To determine which of the inferred D-terms are kept, values from Table 1 were taken as guide, including RC and TO. The first parameter of ProofSubproof is the number of iterations of the "given D-term loop". Proof (9.) can be combined with Peirce and Syll to the overall proof with compacted size 32, one less than MER. The maximal value of DK L is shown as second parameter, because, when limited to 7, proof (9.) cannot be found. Proof (10.), which has a small tree size, was obtained from (8.) by rewriting subproofs with a variation of C-reduction that rewrites single term occurrences, considering also D-terms from a precomputed table of small proofs.

Conclusion
Starting out from investigating Łukasiewicz's classic formal proof [21], via its refinement by Meredith [26] we arrived at a formal reconstruction of Meredith's condensed detachment as a special case of the CM. The resulting formalism yields proofs as objects of a very simple and common structure: full binary trees which, in the tradition of term rewriting, appear as terms, D-terms, as we call them. To form a full proof, formulas are associated with the nodes of D-terms: axioms with the leaves and lemmas with the remaining nodes, implicitly determined from the axioms through the node position and unification. The root lemma is the most general proven theorem. Lemmas also relate to compressed representations of the binary trees, for example as DAGs, where the re-use of a lemma directly corresponds to sharing the structure of its subproof. For future work we intend to position our approach also in the context of earlier works on proofs, proof compression and lemma introduction, e.g., [44,13], and think of compressing D-Terms in forms that are stronger than DAGs, e.g., by tree grammars [20].
The combination of formulas and explicitly available proof structures naturally leads to theorem proving methods that take structural aspects into account, in various ways, as demonstrated by our first experiments. This goes beyond the common clausal tableau realizations of the CM, which in essence operate by enumerating uncompressed proof structures. The discussed notions of regularity and lemma generation methods seem immediately suited for further investigations in the context of first-order theorem proving in general. For other aspects of the work we plan a stepwise generalization by considering further single axioms for the implicational fragment IF [23, 21,38], single axioms and axiom pairs for further logics [38], the about 200 condensed detachment problems in the LCL domain of the TPTP, problems which involve multiple non-unit clauses, and adapting D-terms to a variation of binary resolution instead of detachment. In the longer run, our approach aims at providing a basis for approaches to theorem proving with machine learning (e.g. [10,16]). With the reification of proof structures more information is available as starting point. As indicated with our exemplary feature table for Meredith's proof, structural properties are considered thereby from a global point of view, as a source for narrowing down the search space in many different ways in contrast to just the common local view "from within a structure", where the narrowing down is achieved for example by focusing on a "current branch" during the construction of a tableau. A general lead question opened up by our setting is that for exploring relationships between properties of proof structures and the associated formulas in proofs of meaningful theorems. One may expect that characterizations of these relationships can substantially restrict the search space for finding proofs.

A Proofs of Claims in the Paper, Additional Examples, and Refined Formal Proofs of the Completeness of Łukasiewicz's Single Axiom
A.1 Supplementary Material for Section 3.2 The following example shows for a given D-term the set of associated pairings (Def. 8) with its most general unifier (Defs. 4.i and 3), as well as the IPT and MGT for a specific position in the D-term (Def. 9).
Example A20. Let α be an axiom assignment that maps the primitive D-term 1 to the canonical representation of the axiom Simp. That is, . We can then calculate that Now Ipt(d, 1) and Mgt(d| 1 ) can be determined as follows, where we supplement the values obtained by applying the displayed unifiers with variants that have variable names p, q, r, s and are easier to read: 2 )))) = · P(i(i(p, iqp), i(r, isr))).
Side remark: In this simple example it holds that Mgt(d) = · P(i(p, i(q, p))), that is, the MGT of d is a variant of the axiom.
The following example illustrates the application of shift (Def. 6).
In the second example, observe that position 2. Proof. Easy to see.
The following proposition shows an interplay of pairing and shift that is used later in proofs.
Proposition A23. Let d be a D-term, let p be a position in Pos(d) and let α be an axiom assignment for d. Then Proof. Easy to see.
Sometimes it is useful to refer to all variables associated with positions or associated with members of a given set of positions, regardless of whether they are of the form y p or x i p . The following definition provides a notation for this.
We are now ready to prove Prop. 10.
Proposition 10. For all D-terms d, positions p ∈ Pos(d) and axiom assignments α for d it holds that Proof. Can be shown in the following steps, explained below: Step (3) follows easily from the definition of most general unifier.
Step (4)  Lemma 11. Let d be a D-term and let α be an axiom assignment for d. Then for all p ∈ Pos(d) it holds that: Proof. Let σ = mgu({pairing α (d, q) | q ∈ Pos(d)}) and assume it is defined.
By expanding the definition of Det and rearranging formula components, this entailment can be brought into the following form that obviously holds, as its right side is obtained from instantiating universal quantifiers on the left side: ∀xy (Px ∧ Pixy → Py) |= P(y p.2 σ) ∧ P(i(y p.2 , y p )σ) → P(y p σ).
Theorem 12. Let d be a D-term and let α be an axiom assignment for d. Then Proof. By induction on the structure of d it follows from Lemma 11 that Contracting the definition of Mgt, the right side of this entailment can be written as Mgt α (d). Since the left side of the entailment has no free variables, we can replace the right side with its universal closure and obtain the statement to be proven.
The proposition can then be shown with the following sequence of equations, explained below: Step (4) follows from the precondition d ∈ D, step (5) from (1). Steps (3) and (7) are obtained by expanding or contracting, resp., the definition of c-size. The remaining steps are easy to see.
The proposition can then be shown with the sequence of equations in the proof of Prop. A28.i, altered in the following way: Step (4) is justified since the precondition d > c e implies d ∈ D. In step (5), the relation ≥ is replaced by >, which is justified by (8).
The converse statements of Prop. A28.i and A28.ii do not hold, as demonstrated by the following example.   Proof. We begin with shared aspects of the proofs of both subtheorems. The D-term e must be in D, which is explicitly stated as precondition for Theorem 14.i and implied by the precondition e > c e of Theorem 14.ii. There must exist a set {d 1 , . . . , d n } ⊆ D for some n ≥ 0 such that the set S def = {f ∈ D | d ¤ f } of compound subterms of d can be characterized as the disjoint union of three particular subsets: Let T be the set of those proper subterms of e that are compound and have in d an occurrence in a position other than as subterm of e. Clearly {f ∈ D | e £ f } ⊇ T . Thus, by (1)) we can characterize S also as The set S def = {f ∈ D | d ¤ f } of compound subterms of d can then be characterized as follows: .
From e ≥ c e , which is a precondition of Theorem 14.i as well as Theorem 14.ii, We now turn to the two individual subtheorems.
Given the precondition e > c e we can conclude by Theorem 14.i that for each i ∈ {1, . . . , n} it holds that c-size(d i ) ≥ c-size(d i [e → e ]). Hence: From the precondition e > c e and Prop. A28.ii it follows that c-size(e) > c-size(e ). From (4) and (6) we can then conclude: By (1), sc-size(d) can be characterized as follows: Since the right side of (8) is identical to left side of (7) it follows that sc-size(d) > sc-size(d ), the conclusion of the subtheorem to be shown.
The following example illustrates the D-term size measure sc-size(d), which was defined with Theorem 14.ii.
Step (5) is obtained by expanding the definition of σ, and step (6) follows since S = T ∪ G.
Step (9) follows from Prop. A32.ii since τ | {yp 1 ,...,yp n } = · mgu({{y p1 , y p1 τ }, . . . , {y pn , y pn τ }}). Finally, step (10) is obtained by Prop. A32.ii and the definition of γ. We are now ready to prove the core lemma that shows how the subsumption relationship between replaced subterm occurrences and a replacing D-term transfers to the subsumption relationship between the containing D-terms, before and after the replacement. It is the basis of Theorems 16 and 17 below, which express practically useful conditions for subterm replacement of D-terms. The setting of the lemma is illustrated in Fig. 4. Because the detailed proof is lengthy, we present it modularized into four parts, (I) Conversion of the Preconditions, (II) Determining the Instantiating Substitution ρ, (III) Contexts where ρ is Void, and (IV) Deriving the Conclusion. Figure 4 may help to get an intuitive overview of the parameters of the proven lemma.

Part I. Conversion of the Preconditions
The following step is a precondition of the lemma to be proven.
The following statements whose proof is described below show that σ when applied to y q and y pi can be decomposed into γ followed by µ.
Step (2) follows from Lemma A33 with its parameters p 1 , . . . , p n instantiated by the positions of the same name in the lemma to be proven but its parameter q instantiated to p i for an arbitrary i ∈ {1, . . . , n}. The precondition p i < q for all i ∈ {1, . . . , n} of Lemma A33 then instantiates to p j < p i for all j ∈ {1, . . . , n}, which follows from (1).
The conversion of the right side of the considered precondition is based on some auxiliary definitions and statements. For all i ∈ {1, . . . , n} define the following sets of pairs of terms and substitutions: Then, as explained below, for all i, j ∈ {1, . . . , n} the following holds: Step (9) follows immediately from the definitions of T i , T i and T .
Step (10) follows from the definition of T i and the definition of pairing (Def. 8).
Step (18) is obtained from (17) by expanding the definition of Mgt.
Step (19) follows from Prop. A22, step (20) since by the definition of d it holds that d | pi = e, and step (21) from Prop. A23.
Step (22) is obtained by contracting the definition of T i . Step (23) follows from (14). Note that (17) is independent from i and the conversion of (17) to (23) is possible for any i ∈ {1, . . . , n}.
Part II. Determining the Instantiating Substitution ρ We show, as explained below, that for all i ∈ {1, . . . , n} there exists a substitution ρ i with the following properties: Steps (25) and (26) follow from (24).
Step (33) follows from the definition of ρ and step (29).

Part III. Contexts where ρ is Void
The variables occurring in members of the range of γ as well as y q are contained in the same set of position-associated variables: Step (34) follows from the definitions of γ and G and the definition of pairing (Def. 8).
Step (35) follows from the precondition that for all i ∈ {1, . . . , n} it holds that p i < q. Now, let y be a position related variable and let v be a variable such that y ∈ {y p1 , . . . , y pn , y pq }, and v ∈ Var (yγ).
Step (37) is proven by considering three cases (the first two overlap, the third applies if none of the first two applies):

Part IV. Deriving the Conclusion
The conclusion of the lemma to be proven, that is, can be reformulated as (40) y q γµ ≥ · y q γν.
To prove (40), we need a further auxiliary statement, which is derived along with an intermediate step about the domain of γ as explained below: Step (41) follows from the definitions of γ and G and the definition of pairing (Def. 8).
Step (51) follows from (50). Finally, step (52), which is the goal to be proven listed above as (40), follows from (51) and (39). a defined MGT. Only two of them, the prime core and another D-term with the same MGT, remain if we, aside of a redundancy criterion (no smaller prime proof of the MGT), require that the number of different variables in the MGT is the same as in the axiom, i.e., 4. Another possibility that, however, leads to larger sets can be based on the property that all subproofs have a weakly organic MGT. Of course, the size parameter 17 in the invocation of PrimeCore has been chosen according to Table 1. In practice, a system could try this form of lemma generation with increasing values of the size parameter. For axiom Łukasiewicz , experiments with other size values did not lead to a substantial decrease of the compacted size.
Notes on the ProofSubproof Lemma Computation Method. The greatest part of the running times for lemma computation with ProofSubproof reported in Table 2 was taken for determining the properties RC (C-regular ) and TO (weakly organic). The MGT of subproof 30 of Table 1 was for proofs (7.)-(9.) of Table 2 among the generated lemmas, but reached without passing through subproof 27, which has > c as value of DS. CMProver , which was used for proofs (8.) and (9.), is compared to LeanCoP [29] more like the Prolog Technology Theorem Prover (PTTP) [36] and SETHEO [19] based on a compilation of the input clauses to Prolog code. Early experiments where SETHEO was combined with bottom-up lemma generation were described in [35]. In our experiments CMProver was configured such that the cost measure underlying iterative deepening is the number of subgoals [36], reflecting the tree size.
New Short Proofs. Figures 5 and 6 below show proofs obtained in the experiments described in Sect. 6. The proofs are shown in Meredith's notation, like Fig. 2c (p. 4) and Fig. 3 (p. 5). For subproofs whose MGT is also the MGT of a subproof of MER (Fig. 3) the respective line number in MER is annotated with prefix M. Representations of these proofs as Prolog-readable D-terms are provided at http://cs.christophwernhard.com/cd. The subproof of Syll (i.e., problem ŁDS ) has compacted size 30 and was obtained in experiment (9.) of Table 2 (p. 14).