Keywords

1 Background on General Elimination Rules

Standard natural deduction rules for Int (intuitionistic predicate logic) in the style of Gentzen [9] and Prawitz [24] are presumed to be familiar. The theory of cut-elimination for sequent calculus rules is very clear: whether a derivation in a sequent calculus is cut-free or not is easily defined, according to the presence or absence of instances of the Cut rule. For natural deduction, normality is a less clear concept: there are several inequivalent definitions (including variations such as “full normality”) in the literature. For implicational logic it is easy; but rules such as the elimination rule for disjunction cause minor problems with the notion of “maximal formula occurrence” (should one include or not include the permutative conversions?), and more problems when minor premisses have vacuous discharge of assumptions.

One proposed solution, albeit partial, is the uniform use of general elimination rules, i.e. GE-rules. These can be motivated in terms of Prawitz’s inversion principle Footnote 1: “the conclusion obtained by an elimination does not state anything more than what must have already been obtained if the major premiss of the elimination was inferred by an introduction” [25, p. 246]. Normality is now the simple idea [39] that the major premiss of each elimination step should be an assumption; see also [13, 36].

The standard elimination rules for disjunction, absurdity and existential quantification are already GE rules:

(with y fresh in \(\exists E\)) and the same pattern was proposed (as a GE-rule) in the early 1980 s for conjunction

by various authors, notably Prawitz [26, 27], Martin-Löf [15] and Schroeder-Heister [30], inspired in part by type theory (where conjunction is a special case of the \(\Sigma \)-type constructor, with \(A \wedge B =_{\textit{def}} \Sigma (A,B)\) whenever B(x) is independent of x) and (perhaps) in part by linear logic [10] (where conjunction appears in two flavours: multiplicative \(\otimes \) and additive &).

To this one can add GE-rules for implicationFootnote 2 and universal quantification:

Rules of the first kind are conveniently called “flattened” [29] (in comparison with Schroeder-Heister’s “higher-level” rules, for which see [30, 32]). López-Escobar [13] distinguishes between the premiss A of \(\supset \!\! GE\) as a “minor” premiss and that of C (assuming B) as a “transfer” premiss.Footnote 3

One thus has a calculus of rules in natural deduction style for Int; such calculi, and their normalisation results, have been studied by von Plato [39], by López-Escobar [13] and by Tennant [36]. With the definition (given above) that a deduction is normal iff the major premiss of every elimination step is an assumption, the main results are:

  1. 1.

    Weak Normalisation (WN): every deduction can be replaced by a normal deduction of the same conclusion from the same assumptions [13, 20, 36, 39].

  2. 2.

    Strong Normalisation (SN), for the implicational fragment: an obvious set of rules for reducing non-normal deductions is strongly normalising, i.e. every reduction sequence terminates [12, 13, 36, 37].

  3. 3.

    SN, for the full language: a straightforward extension of the proof of [37] for implicationFootnote 4; also, the proofs for implication “directly carry over” [12] to a system with conjunctions and disjunctions. An argument (using the ordinary elimination rule for implication) is given in [35] for the rules for implication and existential quantification, with the virtue of illustrating in detail how to handle GE rules where the Tait–Martin-Löf method of induction on types familiar from [11] is not available. See also [13].

  4. 4.

    Some straightforward arguments for normalisation (by induction on the structure of the deduction) [40].

  5. 5.

    A 1-1 correspondence with intuitionistic sequent calculus derivations [20, 39].

  6. 6.

    Some interpolation properties [17].

  7. 7.

    Extension of the normalisation results to classical logic [41].

Despite the above results, there are some disadvantages:

  1. 1.

    Poor generalisation of the GE rule for implication to the type-theoretic constant \(\varPi \), of which \(\supset \) can be treated as a special case [15]: details below in Sect. 3.

  2. 2.

    Too many deductions, as in sequent calculus. Focused [aka “permutation-free”] sequent calculi [5, 6] have advantages. Sequent calculus has (for each derivable sequent) rather too many derivations, in comparison to natural deduction, since derivations often have many permutations each of which is, when translated to ordinary natural deduction, replaced by an identity of deductions. The GE-rules have the same feature, which interferes with rather than assists in root-first proof search.

  3. 3.

    (For some complex constants, if one adopts the methodology beyond the basic intuitionistic ones) a “disharmonious mess” [4]: details below in Sect. 4.4.

  4. 4.

    No SN results (yet, in general) for GE-rules for arbitrarily complex constants.

2 Is Bullet a Logical Constant?

Read [28] has, following a suggestion of Schroeder-Heister (with Appendix B of Prawitz’s [26] and Ekman’s ParadoxFootnote 5 [7] in mind), proposed as a logical constant a nullary operator \(\bullet \) (aka R, for “Russell”) with the single (but impure) introduction rule

The GE-rule justified by this (along the same lines as for implication) is then

which, given the usual \(\bot E\) rule and the unnecessary duplication of premisses, can be simplified to

So, by this \(\bullet E\) rule, the premiss of the \(\bullet I\) rule is deducible, hence \(\bullet \) is deducible, hence \(\bot \) is deducible.

There is however a weakness (other than just that it leads to inconsistency) in the alleged justification of \(\bullet \) as a logical constant: it is a circularity. We follow Martin-Löf [15, 16] and Dummett [2] in accepting that we understand a proposition when we understand what it means to have a canonical proof of it, i.e. what forms a canonical proof can take. In the case of \(\bullet \), there is a circularity: the introduction rule gives us a canonical proof only once we have a proof of \(\bot \) from the assumption of \(\bullet \), i.e. have a method for transforming arbitrary proofs of \(\bullet \) into proofs of \(\bot \). The reference here to “arbitrary proofs of \(\bullet \)” is the circularity.

There are similar ideas about type formers, and it is instructive to consider another case, an apparent circularity: the formation rule (in [15]) for the type N of natural numbers. That is a type that we understand when we know what its canonical elements are; these are 0 and, when we have an element n of N, the term s(n). The reference back to “an element n of N” looks like a circularity of the same kind; but it is rather different—we don’t need to grasp all elements of N to construct a canonical element by means of the rule, just one of them, namely n.

A formal treatment of this issue has long been available in the type theory literature, e.g. Mendler [18], Luo [14], Coq [1]. We will try to give a simplified version of the ideas. With the convention that propositions are interpreted as types (of their proofs), we take type theory as a generalisation of logic, with ideas and restrictions in the former being applicable to the latter. The simplest recursive case (N) has just been considered and the recursion explained as harmless (despite Dummett’s reservations expressed as his “complexity condition” [2]). What about more general definitions?

The definition of the type N can be expressed as saying that N is the least fixed point of the operator \(\varPhi _N =_{\textit{def}} \lambda X .(1 + X)\), i.e. \(N =_{\textit{def}} \mu X .(1+X)\). Similarly, the type of lists of natural numbers is \(\mu L .(1 + N \times L)\), and the type of binary trees with leaves in A and node labels in B is \(\mu T.A + (T \times B \times T)\). A unary operator definition \(\varPhi =_{\textit{def}} \lambda X .\ldots \) is said to be positive iff the only occurrences of the type variable X in the body \(\ldots \) are positive, where an occurrence of X in the expression \(A \rightarrow B \) is positive (resp. negative) iff it is a positive (resp. negative) occurrence in B or a negative (resp. positive) occurrence in A; a variable occurs positively in itself, and occurs positively (resp. negatively) in \(A+B\) and in \(A \times B\) just where it occurs positively (resp. negatively) in A or in B. A definition of a type as the least fixed point of an operator is then positive iff the operator definition is positive.

Read’s \(\bullet \), then, is defined as \(\mu X .(X \rightarrow \bot )\). This is not a positive definition; the negativity of the occurrence of X in the body \(X \rightarrow \bot \) is a symptom of the circular idea that \(\bullet \) can be grasped once we already have a full grasp of what the proofs of \(\bullet \) might be.

In practice, a stronger requirement is imposed, that the definition be strictly positive, i.e. the only occurrences of the type variable X in the body \(\ldots \) are strictly positive, where an occurrence of X in the expression \(A \rightarrow B \) is strictly positive iff it is a strictly positive occurrence in B; a variable occurs strictly positively in itself, and occurs strictly positively in \(A+B\) and in \(A \times B\) just where it occurs strictly positively in A or in B. A definition of a type as the least fixed point of an operator is then strictly positive iff the operator definition is strictly positive.

With such definitions, it can be shown that strong normalisation (of a suitable set of reductions) holds [18, Chap. 3]; similar accounts appear in [1, 14].

3 The GE-rule for Implication and the Type-Theoretic Dependent Product Type

The present author commented [3] that the general (aka “flattened” [29]) E-rule for implication didn’t look promising because it didn’t generalise to type theory. Here (after 27 years) are the details of this problem: Recall [15] that in the dependently-typed context

the rule

withFootnote 6 semantics \(split((a,b), c) \rightarrow c(a,b)\) is a generalisation of the rule

Now, ordinary (but with witnesses) Modus Ponens

has, in the dependently-typed context

the generalisation (in which \(ap2(f,a)\) is often just written as \( f\ a\) or \((f\ a)\)):

(with \(\varPi (A,B)\) written as \(A\! \supset \! B\) whenever B(x) is independent of x); but the “flattened” GE rule

with semantics \(ap3(\lambda (g), a,c) \rightarrow c(g(a))\) doesn’t appear to generalise:

in which, note the question-mark—what should go there? In the context y : B(a), the only ingredient is y, which won’t do—it has the wrong type. Addition of an assumption such as x : A (and making c depend on it, as in c(xy)) doesn’t help.

One solution is the system of higher-level rules of Schroeder-Heister [30]. Our own preference, to be advocated after a closer look at flattened GE-rules, is for implication (and universal quantification) to be taken as primitive, with Modus Ponens and the \(\varPi E\) rule taken as their elimination rules, with justifications as in [15].

4 GE-Rules in General

The wide-spread idea that the “grounds for asserting a proposition” collectively form some kind of structure which can be used to construct the assumptions in the minor premiss(es)Footnote 7 of a GE-rule is attractive, as illustrated by the idea that, where two formulae A, B are used as the grounds for asserting \(A \wedge B\), one may make the pair AB the assumptions of the minor premiss of \(\wedge GE\). An example of this is López-Escobar’s [13], which gives I-rules and then GE-rules for implicationFootnote 8 and disjunction, with the observation [13, p. 417] that:

Had the corresponding I-rule had three “options with say 2, 3 and 5 premises respectively, then there would have been \(2 \times 3 \times 5\) E-rules corresponding to that logical atom.Footnote 9 Also had there been an indirectFootnote 10 premise, say \(\nabla \mathfrak {D} / \mathfrak {E}\), in one of the options then it would contribute a minor premise with conclusion \(\mathfrak {E}\) and a transfer premise with discharged sentence \(\mathfrak {D}\) to the appropriateFootnote 11 E-rule.

In practice, there is an explosion of possibilities, which we analyse in order as follows:

  1. 1.

    a logical constant, such as \(\bot \), \(\wedge \), \(\vee \), \(\equiv \) or \(\oplus \) (exclusive or), can be introduced by zero or more rules;

  2. 2.

    each of these rules can have zero or more premisses, e.g. \(\top I\) has zero, \(\supset \! I\) and each \(\vee I_i\) have one, \(\wedge I\) has two;

  3. 3.

    each such premiss may discharge zero or more assumptions (as in \(\supset \! I\));

  4. 4.

    each such premiss may abstract over one or more variables, as in \(\forall I\);

  5. 5.

    and a premiss may take a term as a parameter (as in \(\exists I\)).

It is not suggested that this list is exhaustive: conventions such as those of substructural logic about avoiding multiple or vacuous discharge will extend it, as would recursion; but it is long enough to illustrate the explosion. The paper [8] attemptedFootnote 12 to deal with all these possibilities and carry out a programme of mechanically generating GE-rules from a set of I-rules with results about harmony.

4.1 Several I-Rules

Where a logical constant (such as \(\vee \)) is introduced by several alternativeFootnote 13 rules, one can formulate an appropriate GE-rule as having several minorFootnote 14 premisses, one for each of the I-rules, giving a case analysis. This is very familiar from the case of \(\vee \) and the usual \(\vee E\) rule:

so an appropriate generalisation for \(n \ge 0\) alternative I-rules is to ensure that “the GE-rule” has n minor premisses. This works well for \(\bot \), with no I-rules: the \(\bot E\)-rule, as in [9, 24], has no minor premisses.Footnote 15

4.2 I-Rule Has Several Premisses

Now there are two possibilities following the general idea that the conclusion of a GE-rule is arbitrary. Let us consider the intuitionistic constant \(\wedge \) (with its only I-rule having two premisses) as an example. The first possibility is as illustrated earlier: the rule

The second is to have two GE-rules:

and it is routine to show that the ordinary GE-rule for \(\wedge \) is derivable in a system including these two rules, and vice-versa. Tradition goes for the first possibility; examples below show however that this doesn’t always work and that the second may be required.

4.3 Premiss of I-Rule Discharges Some Assumptions

Natural deduction’s main feature is that assumptions can be discharged, as illustrated by the I-rule for \(\supset \) and the E-rule for \(\vee \). This raises difficulties for the construction of the appropriate GE-rules: Prawitz [26] got it wrong (corrected in [27]), Schroeder-Heister [30] gave an answer in the form of a system of rules of higher level, allowing discharge not just of assumptions but of rules (which may themselves discharge ...)—but, although much cited, use of this system seems to be modest. As already discussed, an alternative was mentioned (disparagingly) in [3] and (independently) adopted more widely by others [13, 36, 39], the “flattened” GE-rule for \(\supset \) being

Let us now consider the position where two premisses discharge an assumption (just one each is enough): consider the logical constant \(\equiv \) with one I-rule, namely

According to our methodology, we have two possibilities for the GE-rule; first, have the minor premiss of the rule with two assumptions BA being discharged and some device to ensure that there are other premisses with A and B as conclusions. There seems to be no way of doing this coherently, i.e. with A somehow tied to the discharge of B and vice-versa. The alternative is to have two GE-rules, along the lines discussed above for \( \wedge \), and these are clearly

by means of which it is clear that, from the assumption of \(A \equiv B\), one can construct a proof of \(A \equiv B\) using the introduction rule as the last step, implying the “local completeness” of this set of rules in a sense explored by Pfenning and Davies [22]:

We are thus committed in general to the use of the second rather than the first possibility of GE-rules—the use of two such rules rather than one—when there are two premisses in an I-rule.

4.4 GE Harmony: A Counter-Example

Francez and the present author [8]Footnote 16 developed these ideas (looking also at the analogues of universal and existential quantification) by defining the notion of “GE-harmony” (E-rules are GE-rules obtained according to a formal procedure, of which parts are as described above) and showing that it implied “local intrinsic harmony” (local soundness, i.e. reductions, and local completeness, as illustrated above for \(\equiv \)). The classification in [8] corresponds roughly but not exactly to the different possibilities enumerated above (\(1 \ldots 5\)): “non-combining” (zero or one premiss[es]) or “combining” (more than one premiss) corresponds to possibility 2; “hypothetical” (a premiss with assumptions discharge) or “categorical” (no such discharge) corresponds to possibility 3; “parametrized” (a premiss depends on a free variable) corresponds roughly to a mixture of 4 and 5; “conditional” (e.g. there is a freshness condition) corresponds roughly to 4.

Let us now consider a combination of such ideas, e.g. two I-rules each of which discharges an assumption, e.g. the pair

What is/are the appropriate GE-rule(s)? It/they might be just

but that only captures, as it were, the first of the two I-rules (and implies that \((A \odot B) \supset (A \supset B)\), surely not what should be the case); so we have to try also

but then these two need to be combined somehow. If into a single rule,Footnote 17 it would be something like

which is weird; with only the second and last of these premisses, already C can be deduced. The meaning of \(A \odot B\) is thus surely not being captured, whether we go for two GE-rules or just one.

A similar example was given in 1968 by von Kutschera [38, p. 15], with two I-rules for an operator F based on the informal definition \(F(A,B,C) \equiv (A \supset B) \vee (C \supset B)\) but the flattened E-rule failing to capture the definition adequately.

4.5 Another [Counter-]Example

Following Zucker and Tragesser’s [42, p. 506], Olkhovikov and Schroeder-Heister [21] have given as a simpler example the ternary constant \(\star \) with two introduction rules:

and the “obvious” GE ruleFootnote 18 thereby justified is:

which is clearly wrong, there being nothing to distinguish it from \(\star (A,C,B)\). Their main point is to show by a semantic argument that there is no non-obvious GE rule for \(\star \), thus defending the “idea of higher-level rules” [30].

4.6 In Other Words

The “flattening” methodology when either the constant being defined has several introduction rules or one or more of such rules have several premisses can lead

  1. 1.

    to a number \(({>} 1)\) of GE rules, none of which on its own suffices, and

  2. 2.

    to a “disharmonious mess”, i.e. a failure to capture the correct meaning.

Already there are enough problems, before we start considering the cases where the premiss abstracts over several variables, instantiates a variable as a term or recurses on the constant being defined.

The solution of Schroeder-Heister [30] is to allow rules to discharge rules. We prefer, however, to propose instead that one should adopt the standard solution from (e.g.) Coq [1]: to reject the idea that the rule for handling implication (and other situations where assumptions are discharged) be treated as illustrated above and instead to take implication (and its generalisation, universal quantification), together with an inductive definition mechanism, as primitive, with traditional “special” elimination rules (e.g. Modus Ponens) but to allow GE rules elsewhere (e.g. for \(\wedge \) and its generalisations \(\Sigma \) and \(\exists \)). This deals with \(\equiv \); likewise, it deals with \(\odot \) as if it were

More precisely, we note that with an introduction rule given in Coq by the inductive definition

figure a

we obtain as a theorem

figure b

and similarly for \(\odot \) we have the inductive definition

figure c

and we can obtain as a theorem

figure d

Not only can we obtain such theorems, but Coq will calculate them (and several variants) from the definitions automatically. Further details of this approach can be found in [23]. For example, existential quantification can be defined thus (we give also the obtained theorem representing the elimination rule):

figure e

Short shrift is given to \(\bullet \):

figure f

This pushes the problem (of constructing and justifying elimination rules given a set of introduction rules, and establishing properties like harmony, local completeness and stability) elsewhere: into the same problem for a mechanism of inductive definitions and for the rules regarded as primitive: introduction and (non-general) elimination rules for implication and universal quantification. Apart from the issue of stability, we regard the latter as unproblematic, and the former as relatively straightforward (once we can base the syntax on implication and universal quantification).

To a large extent this approach may be regarded as just expressing first-order connectives using second-order logic, and not very different from Schroeder-Heister’s higher-level rules. The important point is that there are difficulties (we think unsurmountable) with trying to do it all without such higher-order mechanisms.

5 Conclusion

The main conclusion is this: although the idea that the “grounds for asserting a proposition” are easily collected together as a unit is attractive, the different ways in which it can be done (disjunctive, conjunctive, with assumption discharge, with variable abstraction or parameterisation, ..., recursion) generate (if the GE rules pattern is followed) many problems for the programme of mechanically generating one (or more) elimination rules for a logical constant, other than in simple cases. There are difficulties with the mechanical approach in [8]; there are similar difficulties in [13]. Without success of such a programme, it is hard to see what “GE harmony” can amount to, except as carried out in (e.g.) Coq [1] where strictly positive inductive type definitions lead automatically to rules for reasoning by induction and case analysis over objects of the types thus defined, and with strong normalisation results. A similar conclusion is to be found in [33].