Keywords

1 Introduction

The method of synthetic tableaux (ST, for short) is a proof method based entirely on direct reasoning but yet designed in a tableau format. The basic idea is that all the laws of logic, and only laws of logic, can be derived directly by cases from parts of some partition of the whole logical space. Hence an ST-proof of a formula typically starts with a division between ‘p-cases’ and ‘\(\lnot p\)-cases’ and continues with further divisions, if necessary. Further process of derivation consists in applying the so-called synthesizing rules that build complex formulas from their parts—subformulas and/or their negations. For example, if p holds, then every implication with p in the succedent holds, ‘\(q \rightarrow p\)’ in particular; then also ‘\(p \rightarrow (q \rightarrow p)\)’ holds by the same argument. If \(\lnot p\) is the case, then every implication with p in the antecedent holds, thus ‘\(p \rightarrow (q \rightarrow p)\)’ is settled. This kind of reasoning proves that ‘\(p \rightarrow (q \rightarrow p)\)’ holds in every possible case (unless we reject tertium non datur in the partition of the logical space). There are no indirect assumptions, no reductio ad absurdum, no assumptions that need to be discharged. The ST method needs no labels, no derivation of a normal form (clausal form) is required.

In the case of Classical Propositional Logic (CPL, for short) the method may be viewed as a formalization of the truth-tables method. The assumption that p amounts to considering all Boolean valuations that make p true; considering \(\lnot p\) exhausts the logical space. The number of cases to be considered corresponds to the number of branches of an ST, and it clearly depends on the number of distinct propositional variables in a formula, thus the upper bound for complexity of an ST-search is the complexity of the truth-tables method. In the worst case this is exponential with respect to the number of variables, but for some classes of formulas truth-tables behave better than standard analytic tableaux (see [4,5,6,7] for this diagnosis). However, the method of ST can perform better than truth-tables, as shown by the example of ‘\(p \rightarrow (q \rightarrow p)\)’, where we do not need to partition the space of valuations against the q/\(\lnot q\) cases.Footnote 1 The question, obviously, is how much better? The considerations presented in this paper aim at developing a quasi-experimental framework for answering it.

The ST method was introduced in [19], then extended to some non-classical logics in [20, 22]. An adjustment to the first-order level was presented in [14]. There were also interesting applications of the method in the domain of abduction: [12, 13]. On the propositional level, the ST method is both a proof- and model-checking method, which means that one can examine satisfiability of a formula A (equivalently, validity of \(\lnot A\)) and its falsifiability (equivalently, inconsistency of \(\lnot A\)) at the same time. Normally, one needs to derive a clausal form of both A and \(\lnot A\) to check the two dual semantic cases (satisfiability and validity) with one of the quick methods, while the ST-system is designed to examine both of them. Wisely used, this property can contribute to limiting the increase in complexity in verification of semantic properties.

For the purpose of optimization of the ST method we created a heuristics that leads to construction of a variable ordering—a task similar to the one performed in research on Ordered Binary Decision Diagrams (OBDDs), and, generally, in Boolean satisfiability problem (SAT) [8, 15]. In Sect. 3 we sketch a comparison of STs to OBDDs. Let us stress at this point, however, that the aim of our analysis remains proof-theoretical—the ST method is a ‘full-blooded’ proof method working on formulas of arbitrary representation. It was already adjusted to first-order and to some non-classical logics, and has a large scope of applications beyond satisfiability checking of clausal forms.

The optimization methods that we present are based on exploratory data analysis performed on millions of tableaux. Some aspects of the analysis are also discussed in the paper. The data are available on https://ddsuam.wordpress.com/software-and-data/.

Here is a plan of what follows. The next section introduces the ST method, Sect. 3 compares STs with analytic tableaux and with BDDs, and Sect. 4 presents the implementation in Haskell. In Sect. 5 we introduce the mathematical concepts needed to analyse heuristics of small tableaux generation. In Sect. 6 we describe the analysed data, and in Sect. 7—the obtained results. Section 8 confronts our approach with the pigeonhole principle, and Sect. 9 indicates plans for further research.

2 The Method of Synthetic Tableaux

Language. Let \(\mathcal {L}_\mathsf {CPL}\) stand for the language of \(\mathsf {CPL}\) with negation, \(\lnot \), and implication, \(\rightarrow \). \(\mathsf {Var}=\{p,q,r,\ldots ,p_i,\ldots \}\) is the set of propositional variables and ‘\(\mathsf {Form}\)’ stands for the set of all formulas of the language, where the notion of formula is understood in a standard way. \(A, B, C \ldots \) will be used for formulas of \(\mathcal {L}_\mathsf {CPL}\). Propositional variables and their negations are called literals. Length of a formula A is understood as the number of occurrences of characters in A, parentheses excluded.

Let \(A \in \mathsf {Form}\). We define the notion of a component of A as follows. (i) A is a component of A. (ii) If A is of the form ‘\(\lnot \lnot B\)’, then B is a component of A. (iii) If A is of the form ‘\(B \rightarrow C\)’, then ‘\(\lnot B\)’ and C are components of A. (iv) If A is of the form ‘\(\lnot (B \rightarrow C)\)’, then B and ‘\(\lnot C\)’ are components of A. (v) If C is a component of B and B is a component of A, then C is a component of A. (vi) Nothing else is a component of A. By ‘\(\mathsf {Comp}(A)\)’ we mean the set of all components of A. For example, \(\mathsf {Comp}(\,p \rightarrow (q \rightarrow p)\,) = \{p \rightarrow (q \rightarrow p), \lnot p, q \rightarrow p, \lnot q, p\}\). As we can see, component of a formula is not the same as subformula of a formula; \(\lnot q\) is not a subformula of the law of antecedent, q is, but it is not its component. Components refer to uniform notation as defined by Smullyan (see [18]) which is very convenient to use with a larger alphabet. Let us also observe that the association of \(\mathsf {Comp}(A)\) with a Hintikka set is quite natural, although \(\mathsf {Comp}(A)\) need not be consistent. In the sequel we shall also use ‘\(\mathsf {Comp}^\pm (A)\)’ as a short for ‘\(\mathsf {Comp}(A) \cup \mathsf {Comp}(\lnot A)\)’.

Rules. The system of ST consists of the set of rules (see Table 1) and the notion of proof (see Definition 2). The rules can be applied in the construction of an ST for a formula A on the proviso that (a) the premises already occur on a given branch, (b) the conclusion (conclusions, in the case of (cut)) of a particular application of the rule belongs (both belong) to \(\mathsf {Comp}^\pm (A)\). The only branching rule, called (cut) by analogy to its famous sequent-calculus formulation, is at the same time the only rule that needs no premises, hence every ST starts with an application of this rule. If its application creates branches with \(p_i\) and \(\lnot p_i\), then we say that the rule was applied with respect to \(p_i\).

Table 1. Rules of the ST system for \(\mathcal {L}_\mathsf {CPL}\)

One of the nice properties of this method is that it is easy to keep every branch consistent: it is sufficient to restrict the applications of (cut), so that on every branch (cut) is applied with respect to a given variable \(p_i\) at most once. This warrants that \(p_i, \lnot p_i\) never occur together on the same branch.

The notion of a proof is formalized by that of a tree. If \(\mathcal {T}\) is a labelled tree, then by \(X_\mathcal {T}\) we mean the set of its nodes, and by \({r}_\mathcal {T}\) we mean its root. Moreover, \(\eta _\mathcal {T}\) is used for a function assigning labels to the nodes in \(X_\mathcal {T}\).

Definition 1 (synthetic tableau)

A synthetic tableau for a formula A is a finite labelled tree \(\mathcal {T}\) generated by the above rules, such that \(\eta _\mathcal {T}: X {\setminus } \{{r}_\mathcal {T}\} \longrightarrow \mathsf {Comp}^\pm (A)\) and each leaf is labelled with A or with \(\lnot A\).

\(\mathcal {T}\) is called consistent if the applications of (cut) are subject to the restriction defined above: there are no two applications of (cut) on the same branch with respect to the same variable.

\(\mathcal {T}\) is called regular provided that literals are introduced in the same order on each branch, otherwise \(\mathcal {T}\) is called irregular.

Finally, \(\mathcal {T}\) is called canonical, if, first, it is consistent and regular, and second, it starts with an introduction of all possible literals by (cut) and only after that the other rules are applied on the created branches.

In the above definition we have used the notion of literals introduced in the same order on each branch. It seems sufficiently intuitive at the moment, so we postpone the clarification of this notion until the end of this section.

Definition 2 (proof in ST system)

A synthetic tableau \(\mathcal {T}\) for a formula A is a proof of A in the ST system iff each leaf of \(\mathcal {T}\) is labelled with A.

Theorem 1

(soundness and completeness, see [21]). A formula A is valid in \(\mathsf {CPL}\) iff A has a proof in the ST-system.

Example 1

Below we present two different STs for one formula: \(B = p \rightarrow (q \rightarrow p)\). Each of them is consistent and regular. Also, each of them is a proof of the formula in the ST system.

figure a

In \(\mathcal {T}_1\): 2 comes from 1 by \(\mathbf {r}^2_\rightarrow \), similarly 3 comes from 2 by \(\mathbf {r}^2_\rightarrow \). 5 comes from 4 by \(\mathbf {r}^1_\rightarrow \). In \(\mathcal {T}_2\): nothing can be derived from 1, hence the application of (cut) wrt p is the only possible move. The numbering of the nodes is not part of the ST.

There are at least two important size measures used with respect to trees: the number of nodes and the number of branches. As witnessed by our data, there is a very high overall correlation between the two measures, we have thus used only one of them—the number of branches—in further analysis. Among various STs for the same formula there can be those of smaller, and those of bigger size. An ST of a minimal size is called optimal. In the above example, \(\mathcal {T}_1\) is an optimal ST for B. Let us also observe that there can be many STs for a formula of the same size, in particular, there can be many optimal STs.

Example 2

Two possible canonical synthetic tableaux for \(B = p \rightarrow (q \rightarrow p)\). Each of them is regular, consistent, but clearly not optimal (cf. \(\mathcal {T}_1\)).

figure b

In the case of formulas with at most two distinct variables regularity is a trivial property. Here comes an example with three variables.

Example 3

\(\mathcal {T}_5\) is an irregular ST for formula \(C = (p \rightarrow \lnot q) \rightarrow \lnot (r \rightarrow p)\), i.e. variables are introduced in various orders on different branches. \(\mathcal {T}_6\) is an example of an inconsistent ST for C, i.e. there are two applications of (cut) on one branch with respect to p, which results in a branch carrying both p and \(\lnot p\) (the blue one). The whole right subtree of \(\mathcal {T}_5\), starting with \(\lnot p\), is repeated twice in \(\mathcal {T}_6\), where it is symbolized with letter \(\mathcal {T}^*\). Let us observe that \(\lnot \lnot (r \rightarrow p)\) is a component of \(\lnot C\) due to clause (iv) defining the concept of component.

figure c

On the level of CPL we can use only consistent STs while still having a complete calculus (for details see [19, 21]). An analogue of closing a branch of an analytic tableau for formula A is, in the case of an ST, ending a branch with A synthesized. And the fact that an ST for A has a consistent branch ending with \(\lnot A\) witnesses satisfiability of \(\lnot A\). The situation concerning consistency of branches is slightly different, however, in the formalization of first-order logic presented in [14], as a restriction of the calculus to consistent STs produces an incomplete formalization.

Finally, let us introduce some auxiliary terminology to be used in the sequel. Suppose \(\mathcal {T}\) is an ST for a formula A and \(\mathcal {B}\) is a branch of \(\mathcal {T}\). Literals occur on \(\mathcal {B}\) in an order set by the applications of (cut), suppose that it is \(\langle \pm p_1,\ldots ,\pm p_n \rangle \), where ‘±’ is a negation sign or no sign. In this situation we call sequence \(o = \langle p_1,\ldots ,p_n \rangle \) the order on \(\mathcal {B}\). It can happen that o contains all variables that occur in A, or that some of them are missing. Suppose that \(q_1,\ldots ,q_m\) are all of (and only) the distinct variables occurring in A. Each permutation of \(q_1,\ldots ,q_m\) will be called an instruction for a branch of an ST for A. Further, we will say that the order o on \(\mathcal {B}\) complies with an instruction I iff either \(o = I\), or o constitutes a proper initial segment of I. Finally, \(\mathcal {I}\) is an instruction for the construction of \(\mathcal {T}\), if \(\mathcal {I}\) is a set of instructions for branches of an ST for A such that for each branch of \(\mathcal {T}\), the order on the branch complies with some element of \(\mathcal {I}\).

Let us observe that in the case of a regular ST the set containing one instruction for a branch makes the whole instruction for the ST, as the instruction describes all the branches. Let us turn to examples. \(\mathcal {T}_5\) from Example 3 has four branches with the following orders (from the left): \(\langle p,q \rangle \), \(\langle p,q \rangle \), \(\langle p,r \rangle \), \(\langle p,r \rangle \). On the other hand, there are six permutations of pqr, and hence six possible instructions for branches of an arbitrary ST for the discussed formula. Order \(\langle p,q \rangle \) complies with instruction \(\langle p,q,r \rangle \), and order \(\langle p,r \rangle \) complies with instruction \(\langle p,r,q \rangle \). The set \(\{\langle p,q,r \rangle , \langle p,r,q \rangle \}\) is an instruction for the construction of an ST for C, more specifically, it is an instruction for the construction of \(\mathcal {T}_5\).

3 ST, Analytic Tableaux, BDDs, and SAT Solvers

The analogy between STs and analytic tableaux sketched in the last paragraph of the previous section breaks in two points. First, let us repeat: the ST method is both a satisfiability checker and a validity checker at once, just like a truth table is. Second, the analogy breaks on complexity issues. In the case of analytic tableaux the order of decomposing compound formulas is the key to a minimal tableau. In the case of STs, the key to an optimized use of the method is a clever choice of variables introduced on each branch.

The main similarity between STs and Binary Decision Diagrams (BDDs, see e.g. [8, 15]) is that both methods involve branching on variables. The main differences concern the representation they work on and their aims: firstly, STs constitute a proof method, whereas BDDs are compact representations of Boolean formulas, used mainly for practical aims such as design of electronic circuits (VLSI design); secondly, ST applies to logical formulas, whereas construction of BDDs may start with different representations of Boolean functions, usually circuits or Boolean formulas.

The structure of the constructed tree is also slightly different in the two approaches: in BDDs the inner nodes correspond to variables with outgoing edges labelled with 1 or 0; in STs, on the other hand, inner nodes are labelled with literals or more complex formulas. The terminal nodes of a BDD (also called sinks, labelled with 1 or 0) indicate the value of a Boolean function calculated for the arguments introduced along the path from the root, whereas the leaves of an ST carry a synthesized formula (the initial one or its negation). In addition to that, the methods differ in terms of the construction process: in case of BDDs, tree structures are first generated and then reduced to a more compact form using the elimination and merging rules; the STs, in turn, are built ‘already reduced’. However, the interpretation of the outcome of both constructions is analogous. Firstly, for a formula A with n distinct variables \(p_1,\ldots ,p_n\) and the associated Boolean function \(f_{A} = f_{A}(x_{1},\ldots ,x_{n})\), the following fact holds: If a branch of an ST containing literals from a set L ends with A or \(\lnot A\) synthesized (which means that assuming that the literals from L are true is sufficient to calculate the value of A), then the two mentioned reduction rules can be used in a BDD for \(f_{A}\), so that the route that contains the variables occurring in L followed by edges labelled according to the signs in L can be directed to a terminal node (sink). For example, if A can be synthesized on a branch with literals \(\lnot p_{1}\), \(p_{2}\) and \(\lnot p_{3}\), then \(f_{A}(0,1,0,x_{4},\ldots ,x_{n}) =1\) for all values of the variables \(y\in \{x_{4},\ldots ,x_{n}\}\) and so the route in the associated BDD containing the variables \(x_{1}, x_{2}\) and \(x_{3}\) followed by the edges labelled with 0, 1 and 0, respectively, leads directly to the sink labelled with 1.

However, possibility of applying the reduction procedures for a BDD does not always correspond to the possibility of reducing an ST. For example, the reduced BDD for formula \(p\vee (q\wedge \lnot q)\) consists of the single node labelled with p with two edges directed straight to the sinks 1 and 0; on the other hand, construction of an ST for the formula requires introducing q following the literal \(\lnot p\). This observation suggests that ST, in general, have greater size than the reduced BDDs.

Strong similarity of the two methods is also illustrated by the fact that they both allow the construction of a disjunctive normal form (DNF) of the logical or Boolean formula to which they were applied. In the case of ST, DNF is the disjunction of conjunctions of literals that appear on branches finished with the formula synthesized. The smaller the ST, the smaller the DNF. Things are analogous with BDDs.

Due to complexity issues, research on BDDs centers on ordered binary decision diagrams (OBDDs), in which different variables appear in the same order on all paths from the root. A number of heuristics have been proposed in order to construct a variable ordering that will lead to the smallest OBDDs, using characteristics of the different types of representation of Boolean function (for example, for circuits, topological characteristics have been used for that purpose). OBDDs are clearly analogous to regular STs, the construction of which also requires finding a good variable ordering, leading to a smaller ST. We suppose that our methodology can also be used to find orderings for OBDDs by expressing Boolean functions as logical formulas. It is not clear to us whether the OBDDs methodology can be used in our framework.

Let us move on to other comparisons, this time with a lesser degree of detail. It is very instructive to compare the ST method to SAT-solvers, as their effectiveness is undeniably impressive nowadaysFootnote 2. The ST method does not aim at challenging this effectiveness. Let us explain, however, in what aspect the ST method can still be viewed as a computationally attractive alternative to a SAT solver. The latter produces an answer to question about satisfiability, sometimes producing also examples of satisfying valuations and/or counting the satisfying valuations. In order to obtain an answer to another question—that about validity—one needs to ask about satisfiability of the initial problem negated. As we stressed above, the ST method answers the two questions at once, providing at the same time a description of classes of valuations satisfying and not satisfying the initial formula. Hence one ST is worth two SAT-checks together with a rough model counting.

Another interesting point concerns clausal forms. The method of ST does not require derivation of clausal form, but the applications of the rules of the system, defined via \(\alpha \)-, \(\beta \)-notation, reflects the breaking of a formula into its components, and thus, in a way, leads to a definition of a normal form (a DNF, as we mentioned above). But this is not to say that an ST needs a full conversion to DNF. In this respect the ST method is rather similar to non-clausal theorem provers (e.g. non-clausal resolution, see [9, 17]).

Let us finish this section with a summary of the ST method. Formally, it is a proof method with many applications beyond the realm of CPL. In the area of CPL, semantically speaking, it is both satisfiability and validity checker, displaying semantic properties of a formula like a truth table does, but amenable to work more efficiently (in terms of the number of branches) than the latter method. The key to this efficiency is in the order of variables introduced in an ST. In what follows we present a method of construction of such variable orders and examine our approach in an experimental setting.

4 Implementation

The main functionality of the implementation described in this section is a construction of an ST for a formula according to an instruction provided by the user. If required, it can also produce all possible instructions for a given formula and build all STs according to them. In our research we have mainly used the second possibility.

The implemented algorithm generates non-canonical, possibly irregular STs. Let us start with some basics. There are three main datatypes employed. Standard, recursively defined formula type, For, used to represent propositional formulas; Monad Maybe Formula, MF, consisting of Just Formula and Nothing—used to express the fact that the synthesis of a given formula on a given branch was successful (Just) or not (Nothing). To represent an ST we use type of trees imported from Data.Tree. Thus every ST can be represented as Tree [MF] [Tree [MF]], that is a tree labelled by lists of MF. We employed such a general structure having in mind possible extensions to non-classical logics (for \(\mathsf {CPL}\) a binary tree is sufficient). The algorithm generating all possible ST for a given formula consists of the following steps:

  1. 1.

    We start by performing a few operations on the goal-formula A:

    1. (a)

      a list of all components of A and all components of \(\lnot A\), and a separate list of the variables occurring in A (atoms A) is generated;

    2. (b)

      the first list is sorted in such a way that all components of a given formula in that list precede it (sort A).

  2. 2.

    After this initial step, based on the list atoms A, all possible instructions for the construction of an ST for A are generated (allRules (atoms A)).

  3. 3.

    For each instruction from allRules (atoms A) we build an ST using the following strategy, called ‘compulsory’:

    1. (a)

      after each introduction of a literal (by (cut)) we try to synthesize (by the other rules) as many formulas from sort A as possible;

    2. (b)

      if no synthesizing rule is applicable we look into the instruction to introduce an appropriate literal and we go back to (a). Let us note that \(\mathcal {T}_1\), \(\mathcal {T}_2\), \(\mathcal {T}_5\) are constructed according to this strategy.

  4. 4.

    Lastly, we generate a CSV file containing some basic information about each generated tree: int.al. the number of nodes and whether the tree is a proof.

Please observe that the length of a single branch is linear in the size of a formula; this follows from the fact that sort A contains only the components of A. On the other hand, an ‘outburst’ of computational complexity enters on the level of the number of STs. In general, if k is the number of distinct variables in a formula A, then for \(k=3\) there are 12 different canonical STs, for \(k=4\) and \(k=5\) this number is, respectively, 576 and 1,688,800. In the case of \(k=6\) the number of canonical STs per formula exceeds \(10^{12}\) and this approach is no longer feasibleFootnote 3.

The Haskell implementation together with necessary documentation is available on https://ddsuam.wordpress.com/software-and-data/.

5 dp-Measure and the Rest of Our Toolbox

As we have already observed, in order to construct an optimal ST for a given formula one needs to make a clever choice of the literals to start with. The following function was defined to facilitate the smart choices. It assigns a rational value from the interval \(\langle 0;1 \rangle \) to each occurrence of a literal in a syntactic tree for formula A (in fact, it assigns the values to all elements of \(\mathsf {Comp}(A)\)). Intuitively, the value reflects the derivative power of the literal in synthesizing A.

The first case of the equation in Definition 3 is to make the function full (=total) on \(\mathsf {Form}\times \mathsf {Form}\), it also corresponds with the intended meaning of the defined measure: if \(B \notin \mathsf {Comp}(A)\), then B is of no use in deriving A. The second case expresses the starting point: to calculate the values of dp(AB) for atomic B, one needs to assign \(1 = dp(A,A)\); then the value is propagated down along the branches of a formula’s syntactic tree. Dividing the value a by 2 in the fourth line reflects the fact that both components of an \(\alpha \)-formula are needed to synthesize the formula. In order to use the measure, we need to calculate it for both A and \(\lnot A\); this follows from the fact that we do not know whether A or \(\lnot A\) will be synthesized on a given branch.

Definition 3

\(dp: \mathsf {Form}\times \mathsf {Form}\longrightarrow \langle 0;1\rangle \)

\(dp(\,A,B\,)= {\left\{ \begin{array}{ll} 0 &{} \text {if } B\not \in \mathsf {Comp}(A),\\ 1 &{} \text {if } B=A,\\ a &{} \text {if } dp(\,A,\lnot \lnot B\,)=a,\\ \frac{a}{2} &{} \text {if } B\in \{C,\lnot D\} \text { and } dp(\,A,\lnot (C \rightarrow D)\,)=a,\\ a &{} \text {if } B \in \{\lnot C,D\}\text { and } dp(\,A,C \rightarrow D\,)=a.\\ \end{array}\right. }\)

Example 4

A visualization of calculating dp for formulas BC from Examples 2, 3 and for \(D=(p \rightarrow \lnot p) \rightarrow p\).

figure d

As one can see from Example 4, the effect of applying the dp measure to a formula and its negation is a number of values that need to be aggregated in order to obtain a clear instruction for an ST construction. However, some conclusions can be drawn already from the above example. It seems clear that the value \(dp(\,p \rightarrow (q \rightarrow p),p\,)=1\) corresponds to the fact that p is sufficient to synthesize the whole formula (as witnessed by \(\mathcal {T}_1\), see Example 1). So is the case with \(\lnot p\). On the other hand, even if \(\lnot q\) is sufficient to synthesize the formula, q is not (see \(\mathcal {T}_2\), Example 1), hence the choice between p and q is plain. But it seems to be the only obvious choice at the moment. In the case of the second formula, every literal gets the same value: 0.5. What is more, in the case of longer formulas a situation depicted by the rightmost syntactic trees is very likely to happen: we obtain \(dp(D,p)=0.5\) twice (since dp works on occurrences of literals), and \(dp(\lnot D,\lnot p)=0.5\) three times.

In the aggregation of the dp-values we use the parametrised Hamacher s-norm, defined for \(a,b \in \langle 0;1 \rangle \) as follows:

$$a\, \mathbf {s}_{\lambda }\, b = \frac{a+b-ab-(1-\lambda )ab}{1-(1-\lambda )ab}$$

for which we have taken \(\lambda =0.1\), as the value turned out to give the best results. Hamacher s-norm can be seen as a fuzzy alternative; it is commutative and associative, hence it is straightforward to extend its application to an arbitrary finite number of arguments. For \(a=b=c=0.5\) we obtain:

$$a\, \mathbf {s}_{\lambda }\, b \approx 0.677,\quad \text {and } (a\, \mathbf {s}_{\lambda }\, b)\, \mathbf {s}_{\lambda }\, c \approx 0.768$$

The value of this norm is calculated for a formula A and a literal l by taking the dp-values dp(Al) for each occurrence l in the syntactic tree of A. This value will be denoted as ‘h(Al)’; in case there is only one value dp(Al), we take \(h(A,l)=dp(A,l)\). Hence, referring to the above Example 4, we have e.g. \(h(B,p)=1\), \(h(\lnot B,\lnot p)=0.25\), \(h(\lnot D,\lnot p)\approx 0.768\).

Finally, function H is defined for variables, not their occurrences, in formula A as follows:

$$H(A,p_i) = \frac{\max (h(A,p_i),h(\lnot A,p_i)) + \max (h(A,\lnot p_i),h(\lnot A,\lnot p_i))}{2}$$

The important property of this apparatus is that for \(a,b<1\) we have \(a\, {\textbf {s}}_{0.1}\, b > \max \{a,b\}\), and thus h(Al) and \(H(A,p_i)\) are sensitive to the number of aggregated elements. Another desirable feature of the introduced functions is that \(h(A,p_i)=1\) indicates that one can synthesize A on a branch starting with \(p_i\) without further applications of (cut); furthermore, \(H(A,p_i)=1\) indicates that both \(p_i\) and \(\lnot p_i\) have this property.

Let us stress that the values of dp, h and H are very easy to calculate. Given a formula A, we need to assign a dp-value to each of its components, and the number of components is linear in the length of A. On the other hand, the information gained by these calculations is sometimes not sufficient. The assignment \(dp(A,p_i)=2^{-m}\) says only that A can be built from \(p_i\) and m other components of A, but it gives us no clue as to which components are needed. In Example 4, H works perfectly, as we have \(H(B,p)=1\) and \(H(B,q)=0.625\), hence H indicates the following instruction of construction of an ST: \(\{\langle p,q \rangle \}\). Unfortunately, in the case of formula C we have \(H(C,p)=H(C,q)=H(C,r)=0.5\), hence a more sophisticated solution is needed.

6 Data

At the very beginning of the process of data generation we faced the following general problem: how to make any conclusive inferences about an infinite population (all Form) on the basis of finite data? Considering the methodological problems connected with applying classical statistical inference methods in this context, we limited our analysis to descriptive statistics, exploratory analysis and testing. To make this as informative as possible, we took a ‘big data’ approach: for every formula we generated all possible STs, differing in the order of applications of (cut) on particular branches. In addition to that, where it was feasible, we generated all possible formulas falling under some syntactical specifications. The approach is aimed at testing different optimisation methods as well as exploring data in search for patterns and new hypotheses. The knowledge gained in this way is further used on samples of longer formulas to examine tendencies.

From now on we use l for the length of a formula, k for the number of distinct variables occurring in a formula, and n for the number of all occurrences of variables (leaves, if we think of formulas as trees). On the first stage we examined a dataset containing all possible STs for formulas with \(l=12\) and \(k \leqslant 4\). There are over 33 million of different STs already for these modest values; for larger k the data to analyse was simply too big. We generated 242,265 formulas, from which we have later removed those with \(k \leqslant 2\) and/or \(k=n\), as the results for them where not interesting. In the case of further datasets we also generated all possible STs, but the formulas were longer and they were randomly generatedFootnote 4. And so we considered (i) 400 formulas with \(l=23, k=3\), (ii) 400 formulas with \(l=23, k=4\), (iii) 100 formulas with \(l=23, k=5\). In all cases \(9 \leqslant n \leqslant 12\); this value is to be combined with the occurrences of negations in a formula—the smaller n, the more occurrences of negation.

Having all possible STs for a formula generated, we could simply check what is the optimal ST’ size for this formula. The idea was to look for possible relations between, on the one hand, instructions producing the small STs, and, on the other hand, properties of formulas that are easy to calculate, like dp or numbers of occurrences of variables. The first dataset included only relatively small formulas; however, with all possible formulas of a given type available, it was possible e.g. to track various types of ‘unusual’ behaviour of formulas and all possible problematic issues regarding the optimisation methods, which could remain unnoticed if only random samples of formulas were generated. In case of randomly generated formulas the ‘special’ or ‘difficult’ types of formulas may not be tracked (as the probability of drawing them may be small), but instead we have an idea of an ‘average’ formula, or average behaviour of the optimisation methods. By generating all the STs, in turn, we gained access to full information not only about the regular but also irregular STs, which is the basis for indicating the set of optimal STs and the evaluation of the optimisation methods.

7 Data Analysis and a Discussion of Results

In this section we present some results of analyses performed on our data. The main purpose of the analyses is to test the effectiveness of the function H in terms of indicating a small ST. Moreover, we performed different types of exploratory analysis on the data, aiming at understanding the variation of size among all STs for different formulas, and how it relates to the effectiveness of H.

Most results will be presented for the five combinations of the values of l and k in our data, that is, \(l=12, k\in \{3,4\}\) and \(l=23,k\in \{3,4,5\}\); however, some results will be presented with the values of \(k=3\) and \(k=4\) grouped together (where the difference between them is insignificant) and the charts are presented only for \(k\geqslant 4\).

We will examine the variation of size among STs using a range statistic: by range of the size of ST for a formula A (ST range, for short) we mean the difference between an ST of maximal and minimal size; this value indicates the possible room for optimization. The maximal-size ST is bounded by the size of a canonical ST for a given formula; its size depends only on k. For \(k=4\) a canonical ST has 16 branches, for \(k=5\) it is 32 branches.

Fig. 1.
figure 1

Distribution of the difference between the size of a maximal and that of a minimal ST for formulas with \(k=4,5\).

The histograms on Fig. 1 present the distributions of ST range for formulas with \(k=4\) and \(k=5\). The rightmost bar in the histogram for \(l=23, k=5\) says that for 5 (among 100) formulas there are STs with only two branches, where the maximal STs for these formulas have 32 branches. We can also read from the histograms that for formulas with \(k=4\) the ST range of some formulas is equal to 0 (\(7.9\%\) of formulas with \(l=12\) and \(3.5\%\) with \(l=23\)), which means that all STs have the same size. We have decided to exclude these formulas from the results of tests of efficiency of H, as the formulas leave no room for optimization. However, as can be seen on the histogram, there were no formulas of this kind among those with \(k=5\). This indicates that with the increase of k the internal differentiation of the set of STs for a formula increases as well, leading to a smaller share of formulas with small ST range.

Two more measures relating to the distribution of the size of ST may be of interest. Firstly, the share of formulas for which no regular ST is of optimal size—it indicates how wrong we can be in pointing to only the regular STs. Secondly, the percentage share of optimal STs among all STs for a given formula. The latter gives an idea what is the chance of picking an optimal ST at random. Table 2 presents both values for formulas depending on k and l (let us recall that formulas with ST range equal to 0 are excluded from the analysis). In both cases we can see clearly a tendency with growing k. As was to be expected, the table shows that the average share of optimal STs depends on the value of k rather than the size of the formula. This is understandable—as the number of branches depends on k only, the length of a formula translates to the length of branches, and the latter is linear in the former. In a way, this explains why the results are almost identical when the size of STs is calculated in terms of nodes rather than branches (as we mentioned above, the overall correlation between the two measures makes the choice between them irrelevant).

Table 2. Row A: the share of formulas that do not have a regular ST of optimal size. Row B: the share of optimal STs among all STs for a formula; this was first calculated for each formula, then averaged over all formulas in a given set.

We can categorise the output of the function H into three main classes. In the first case, the values assigned to variables by H strictly order the variables, which results in one specific instruction of construction of a regular ST. The general score of such unique indications was very high: \(70.9\%\) for formulas with \(l=12\), \(92.0\%\) for \(l=23,k=3,4\), and \(72.0\%\) for \(k=5\). The second possibility is when H assigns the same value to each variable; in this case we gain no information at all (let us recall that we have excluded the only cases that could justify such assignments, that is, the formulas for which each ST is of the same size). The share of such formulas in our datasets was small: \(0.6\%\) for \(l=12\), \(0.1\%\) for \(l=23,k=3,4\) and \(0\%\) for \(k=5\), suggesting that it tends to fall with k rising. The third possibility is that the ordering is not strict, yet some information is gained. In this case for some, but not all, variables the value of H is the same.

The methodology used to asses effectiveness of H is quite simple. We assume that every indication must be a single regular instruction, hence we use additional criteria in case of formulas of the second and third kind described, in order to obtain a strict ordering. If H outputs the same value for some variables, we first order the variables by the number of occurrences in the formula; if the ordering is still not strict, we give priority to variables for which the sum of depths for all occurrences of literals in the syntactic tree is smaller; finally, where the above criteria do not provide a strict ordering, the order is chosen at random.

We used three evaluating functions to asses the quality of indications. Each function takes as arguments a formula and the ST for this formula indicated by our heuristics. The first function (\(F_1\) in Table 3) outputs 1 if the indicated ST is of optimal size, 0 otherwise. The second function (\(F_2\) in Table 3) outputs the difference between the size of the indicated ST and the optimal size. The third function is called proximity to optimal tableau, \(\mathsf {POT}_A\) in symbols:

$$\mathsf {POT}_A(\mathcal {T}) = 1 - \frac{|\mathcal {T}|-min_A}{max_A-min_A}$$

where \(\mathcal {T}\) is the ST for formula A indicated by H, \(|\mathcal {T}|\) is the size of \(\mathcal {T}\), \(max_A\) is the size of an ST for A of maximal size, and \(min_A\) is the size of an optimal ST for A. Later on we skip the relativization to A. Let us observe that the value \(\frac{|\mathcal {T}|-min}{max-min}\) represents a mistake in indication relative to the ST range of a formula, and in this sense \(\mathsf {POT}_A\) can be considered as a standardized measure of the quality of indication. Finally, values of each of the three evaluating functions were calculated for sets of formulas, by taking average values over all formulas in the set.

Table 3. The third column gives the number of formulas satisfying the characteristic presented in the first and the second column. The further three columns display values averaged on the sets. \(F_1\) indicates how often we indicate an optimal ST. \(F_2\) reports the mistake of our indication calculated as the difference of sizes between the indicated ST and an optimal one. Finally, POT indicates proximity to an optimal ST in a standardized way.

The results of the three functions presented in Table 3 show that optimal STs are indicated less often for formulas with greater k; however, the POT values seem to remain stable across all data, indicating that, on average, proximity of the indicated ST to the optimal ones does not depend on k or l.

Further analysis showed that the factor that most influenced the efficiency of our methodology was whether there is at least one value 1 among the dp-values of literals for a formula A. We shall write ‘Max(dp) = 1’ if this is the case, and ‘Max(dp) < 1’ otherwise (we skip the relativisation to A for simplicity). For formulas with Max(dp) = 1, results of the evaluating functions were much better; for example, the value of the \(\mathsf {POT}\) function for formulas with \(l=12\) was 0.979 if Max(dp) = 1, and 0.814 for those with Max(dp) < 1; in case of formulas with \(l=23, k=3,4\) those values were 0.968 and 0.869, respectively, and for formulas with \(l=23, k=5\) it was 0.974 and 0.901, respectively. This shows that our methodology works significantly worse if Max(dp) < 1; on the other hand, if Max(dp) = 1, the dp measure works very well. It should also be pointed out that the difference between the \(\mathsf {POT}\) values for both groups is smaller for formulas with greater l and k. Figure 2 presents a scatter plot that gives an idea of the whole distribution of the values of the \(\mathsf {POT}\) function in relation to the ST range. Each formula on the plot is represented by a point, the colours additionally indicating whether Max(dp) < 1. The chart suggests, similarly as Table 3, that the method works well as the values of l and k rise for formulas, indicating STs that are on average equally close to the optimal ones.

Fig. 2.
figure 2

Distribution of the difference between indicated and optimal ST in relation to ST-range. Every point corresponds to a formula, the points are slightly jittered in order to improve readibility. Each chart corresponds to different data, formulas \(k=3\) are excluded; additionally the colour indicates whether Max(dp) = 1 for a formula.

One can point at two possible explanations of the fact that our methodology works worse for formulas with Max(dp) < 1. Firstly, if e.g., \(dp(A,p) = 2^{-m}\), we only obtain the information that, except for p, m more occurrences of components of A are required in order to synthesize the whole formula. Secondly, the function H neglects the complex dependencies between the various aggregated occurrences of a given variable, taking into account only the number of occurrences of literals in an aggregated group. However, considering very low computational complexity of the method based on the dp values and the function H, the outlined framework seems to provide good heuristics for indicating small STs. Methods that would reflect more aspects of the complex structure of logical formulas would likely require much more computational resources.

On a final note, we would like to add that exploration of the data allowed us to study properties of formulas that went beyond the scope of the optimisation of ST. The data was used in a similar way as in so called Experimental Mathematics, where numerous instances are analysed and visualized in order to e.g. gain insight, search for new patterns and relationships, test conjectures and introduce new concepts (see e.g. [1]).

8 The Pigeonhole Principle

At the end we consider the propositional version of the principle introduced by Cook and Reckhow in [3, p. 43]. In the field of proof complexity the principle was used to prove that resolution is intractable, that is, any resolution proof of the propositional pigeonhole principle must be of exponential size (wrt the size of the formula). This has been proved by Haken in [10], see also[2].

Here is \(PHP_m\) in the propositional version:

$$\begin{aligned} \bigwedge _{0\leqslant i\leqslant m}\ \bigvee _{0\leqslant j<m} p_{i,j} \rightarrow \bigvee _{0\leqslant i< n \leqslant m}\ \bigvee _{0\leqslant j < m} (p_{i,j} \wedge p_{n,j}) \end{aligned}$$

where \(\bigwedge \) and \(\bigvee \) stand for generalized conjunction, disjunction (respectively) with the range indicated beneath.

The pigeonhole principle is constructed in a perfect symmetry of the roles played by the consecutive variables. Each variable has the same number of occurrences in the formula, and each of them gets the same value under H, they also have occurrences at the same depth of a syntactic tree. All this means that in our account we can only suggest a random, regular ST. However, it is worth noticing that, first, H behaves consistently with the structure of the formula, and second, the result is still attractive. In Table 4 the fourth column presents the size of the ST indicated by our heuristics, that is, in fact, generated by random ordering of variables. It is to be contrasted with the number \(2^{k}\) in the last column describing the size of a canonical ST for the formula, which is at the same time the number of rows in a truth table for the formula. The minimal STs for the formulas were found with pen and paper and they are irregular.

Table 4. The pigeonhole principle

9 Summary and Further Work

We presented a proof method of Synthetic Tableaux for CPL and explained how the efficiency of tableau construction depends on the choices of variables to apply (cut) to. We defined possible algorithms to choose the variables and experimentally tested their efficiency.

Our plan for the next research is well defined and it is to implement heuristics amenable to produce instructions for irregular STs. We have an algorithm, yet untested.

As far as proof-theoretical aims are concerned, the next task is to extend and adjust the framework to the first-order level based on the already described ST system for first-order logic [14]. We also wish to examine the efficiency of our indications on propositional non-classical logics for which the ST method exists (see [20, 22]). In the area of data analysis another possible step would be to perform more complex statistical analysis using e.g. machine learning methods.