Keywords

1 Introduction

In Isabelle/HOL, there are specialised procedures for dealing with e.g. natural numbers, linear arithmetic, and metric spaces. Some of these procedures have been verified in Isabelle/HOL, such as a procedure for Presburger arithmetic [12] that was later extended to mixed real-integer arithmetic [11]. This procedure, though, uses reflection to work on goals in Isabelle/HOL, which, during execution, either sacrifices speed by going through the simplifier or requires trusting the code generator. More recently, Stevens and Nipkow [25] presented a verified decision procedure for orders that produces certificates. This approach offers efficient execution by using generated code as well as soundness because the certificates are replayed through Isabelle’s inference kernel.

This paper focuses on another ubiquitous structure in mathematics, namely sets. To the best of our knowledge, we present the first formally verified decision procedure for (a fragment of) set theory. In particular, we consider a quantifier-free fragment which Cantone and Zarba [9] call multi-level syllogistic with singleton (MLSS). The fragment includes the usual set operations of union, intersection, difference, membership, equality and, in addition, it allows the construction of singleton sets.

Since MLSS admits a tableau calculus, generating certificates will be straightforward. Like with the aforementioned order solver, this paves the way for an integration of the decision procedure into Isabelle, adding to its growing body of verified decision procedures.

1.1 Contributions

We present a formalisation in Isabelle/HOL of a tableau calculus for MLSS due to Cantone and Zarba [9] [7, Chapter 14]. We prove soundness and completeness of the calculus and give an abstract specification of a decision procedure that exhaustively applies the rules of the calculus. To obtain total correctness of the procedure, we prove its termination. Additionally, we naively refine the abstract to an executable specification from which we can generate code. The formalisation initially follows the paper but offers a more thorough account of some important details:

  • We deliver the omitted proof of Lemma 2 in the paper [9], a key building block for the completeness proof of the calculus.

  • The formal proof of completeness reveals that the calculus lacks a rule for eliminating double negation.

  • We derive an explicit upper bound for the number of formulas in a tableau branch.

In the context of Isabelle/HOL, there is one crucial aspect that requires us to modify the calculus in the paper: the calculus works under the assumption that every variable is a set; however, this is not the case in Isabelle/HOL, e.g. consider the expression where n is a natural number. We call these variables urelements. To deal with them, we extend the calculus with a lightweight type system and a verified inference algorithm that identifies the urelements.

The modification of the calculus required non-trivial changes to the completeness proof. Here, the formalisation was instrumental because Isabelle immediately revealed which proofs had been broken. This illustrates the usefulness of ITPs for developing logic calculi: they allow us to confidently make modifications without compromising correctness.

All in all, the formalisation amounts to over 6000 lines of theory. It is part of the Archive of Formal Proofs (AFP) [24]. The entry provides an overview theory MLSS_Proc_All.thy that highlights the (mostly syntactic) differences between paper and formalisation and references the constants and theorems that are introduced in this paper.

1.2 Related Work

Since the literature on decidable fragments of set theory is vast, we only focus on MLSS here. Ferro et al. [14] were the first to show the decidability of the fragment. Subsequent work [6] found the decision problem to be NP-complete. To obtain a practical decision procedure, Cantone [4] proposed a tableau calculus, which was later improved by Beckert and Hartmer [1]. Both of these procedures construct a model during execution that guides the proof search. Beckert and Hartmer also cover an extension of the calculus with uninterpreted functions, which Cantone and Zarba [10] later revisited while avoiding the construction of a model during execution. In this paper, we consider a version of the latter procedure due to Cantone and Zarba [9] that is specialised to MLSS and where the branching rules of the calculus are set up to guarantee the mutual exclusivity of the branches. Later extensions of the calculus added certain interpreted functions, such as monotone functions [8] and the inverse of a function [5]. The latter extension notably includes the Cartesian product. Those extensions, though, did not improve upon the tableau calculus for MLSS.

There is a large body of work at the intersection of ITPs and tableau methods, but to keep with this paper’s theme we only consider formalisations of correctness here. For first-order logic, there are abstract completeness proofs using the Beth-Hintikka style of possibly infinite derivation trees [3] as well as the Henkin style of maximally consistent sets [17]. Both are abstract enough to be instantiated with a wide range of concrete calculi. A more concrete formalisation [19] verifies a sequent calculus for first-order logic whose completeness proof is via a translation to semantic tableau.

Beyond completeness, we target decidability, which is more attainable for propositional logic. There is a verified tableau calculus for the modal logic S5 [2] in Lean and one for hybrid logic [18] in Isabelle/HOL. Both of these do not prove termination but there is a formalisation of a tableau calculus for the temporal logic CTL in Coq [13] that does.

1.3 Notation

Isabelle/HOL [21] conforms to everyday mathematical notation for the most part. We establish notation and in particular some essential data types together with their primitive operations that are specific to Isabelle/HOL.

We write to specify that the term t has the type ’a and for the space of total functions from type ’a to type .

Sets with elements of type ’a have the type . The cardinality of a set A is denoted by and the image of A under f by .

We use to describe the type of lists, which are constructed using the empty list [] constructor or the infix cons constructor #, and are appended with the infix operator @. The function converts a list into a set.

We remark that \(\longleftrightarrow \) is equivalent to \(=\) on the type of Booleans and \(\equiv \) is definitional equality of the meta-logic of Isabelle/HOL, which is called Isabelle/Pure. Meta-implication is denoted by \(\Longrightarrow \) and a chain of implications can be abbreviated by .

2 Syntax and Semantics of MLSS

2.1 Syntax

At the heart of MLSS, we have the type of set terms, which is the disjoint union of the empty set and variables as well as the operations union, intersection, difference, and the singleton set represented by the constructor . We keep the type of variables abstract by making it a parameter of the set term data type. The only restriction on the type of variables is that it needs to be infinite. Isabelle/HOL’s data type package automatically defines a function that gives us the set of variables in a set term, which we name . In what follows, we will overload the function to also work on set atoms, formulas, and branches.

figure p

We can combine two set terms to form a set atom by using the membership or the equality operator.

figure q

With the above operators we can also represent the subset operator \(\sqsubseteq _\text {s}\) and enumerate finite sets: is equivalent to and a finite set of elements can be expressed by .

We use the propositional fragment of formulas due to Nipkow [20] with set atoms as propositional atoms to form the quantifier-free fragment MLSS of set theory.

figure v

We will often drop the atom constructor to reduce clutter. Additionally, we use and to denote and , respectively.

Similarly to , we get the function for free that retrieves all set atoms in a formula. We combine these functions to extract all the variables occurring in a set formula.

figure ad

Likewise, we fix the constant that is polymorphic in its argument type b. We overload this constant to return the set terms that are subterms of a set term, set atom, or formula, respectively. Lastly, we introduce the function that computes the subformulas of a formula. The functions and are implemented in the expected way.

2.2 Semantics

The original paper [9] bases the semantics of MLSS on the von Neumann hierarchy of sets \(\mathcal {V}\). We instead use the hierarchy of hereditarily finite sets (HF sets) which fulfil all the same axioms as \(\mathcal {V}\) – that is, the axioms of ZF – except for the axiom of infinity. In particular, the membership relation is well-founded. The HF sets, as we will see, are sufficient to construct a model for any satisfiable MLSS formula. In contrast to \(\mathcal {V}\), the HF sets are directly representable in Isabelle/HOL, and indeed, an AFP entry [23] formalises them. The entry defines a type that comes with the following functionality:

  • The function that converts a finite set of HF sets into an HF set.

  • The usual set operations such as equality (\(=\)), membership (\(\boldsymbol{\in }\)), union (\(\sqcup \)), intersection (\(\sqcap \)), and difference (−) are defined.

  • Finally, the empty set coincides with the ordinal 0, so it is denoted by .

Equipped with the above, we define the interpretation functions

  • and

in the standard way, i.e. by mapping each syntactic construct to the corresponding operation on HF sets and interpreting variables with respect to a given valuation function . For the concrete definition we refer to the formalisation.

We write M \(\models \phi \) for the judgement that the formula \(\phi \) holds under the valuation function M. The implementation of \(\models \) coincides with the interpretation function of Nipkow [20]. As usual, we call a formula \(\phi \) satisfiable if there exists a model M with M \(\models \phi \). Otherwise, we say that \(\phi \) is unsatisfiable.

3 A Tableau Calculus for MLSS

We formalise the tableau calculus for MLSS as described by Cantone and Zarba [9]. Inspired by the formalisation of a tableau calculus for hybrid logic by From [16], we use lists to represent the branches of the tableau tree. Note that we add formulas to the front of the list during branch expansion, so for a branch b is always the formula we are trying to disprove with the tableau. We sometimes call this formula the initial formula.

figure ap

We lift the functions and to branches in the expected way.

In the standard tableau calculus for propositional logic as Fitting [15] describes it, a branch is called closed if it contains both the negation of a formula and the formula itself; conversely, it is called open if it is not closed. For MLSS, we extend the notion of closedness with three additional rules; the first two are straightforward while the last one states that a branch is closed when the branch contains a membership cycle .

figure at

A tableau is called closed if all of its branches are closed.

3.1 Linear Expansion Rules

Table 1. Linear expansion rules. All rules except the double negation rule coincide with the original paper [9]. For brevity, we omit the rules for \(\sqcap _\text {s}\) and \(-_\text {s}\).

The calculus considers two kinds of branch expansion rules: linear and branching rules. As the name suggests, branching rules lead to the creation of new branches in the tableau while linear rules only extend a branch b with new formulas , which we denote by . Table 1 shows the linear expansion rules. Note that in the first two rules for \(=_\text {s}\), l is a literal occurring in the branch. Furthermore, the term-for-term substitution is restricted to the top-level set terms of l, i.e. the set terms that occur directly under one of the atom constructors \(\in _\text {s}\) or \(=_\text {s}\); for example, given the literal

figure ax

we have

figure ay

A more crucial restriction of the linear rules is that no new subterm may be created by their application; for instance, the second rule for \(\in _\text {s}\) is

figure az

which formally represents

figure ba

and may only be used under the condition . The purpose of this restriction is to prevent unbounded expansion of the branch. In fact, we give an explicit upper bound for the number of formulas in a branch in Sect. 7.

Due to boundedness, repeated expansion with linear rules eventually results in a linearly saturated branch, i.e. a branch where no application of linear rules would produce new formulas.

figure bc

Finally, we remark that the original paper [9] is missing the last propositional rule dealing with double negation. This rule is required for completeness, though, considering that the branch is saturated—neither linear nor branching rules apply—and open, but there clearly is no model for the initial formula .

Table 2. Branching expansion rules. We write \(\phi \) for here. All rules coincide with the original paper [9] so we only show an illustrative subset.

3.2 Branching Rules

After running out of linear rules to apply, only the branching rules shown in Table 2 remain. A rule is applicable if its precondition is met and, to prevent unnecessary branching, if it is not subsumed as indicated by the subsumption condition. These rules create multiple branches in the tableau, so we represent the different possibilities to expand a branch b as a set and write . Accordingly, we get a new branch in the tableau for each .

A linearly saturated branch where no further branching is possible is called a saturated branch.

figure bk

Note that even branching rules are defined such that they never create new subterms, except for the last rule that adds a new variable to the branch. These variables serve to manifest an inequality; hence, we call them witnesses.

figure bl

4 A Decision Procedure for MLSS

The mechanics of the decision procedure are typical for a procedure based on a tableau calculus: it decides the satisfiability of a given formula \(\phi \) by determining whether the formula has a closed tableau. More specifically, it initialises the tableau with the singleton branch [\(\phi \)] and checks whether this branch can be expanded to a closed tableau.

We only discuss the abstract specification here and refer the reader to the formalisation for the executable specification. The implementation uses a couple of features of Isabelle/HOL’s function package: instead of defining the function via pattern matching, we specify the equations of the function as conditional rewrite rules. This requires us to prove that the assumptions of the equations are non-overlapping, which is done by automation. The other concern is that Isabelle/HOL requires functions to be total, so a recursive function needs to terminate for it to be well-defined; nevertheless, the termination proof is separated from the definition of the function for modularity. The function package maintains the soundness of the definition by introducing a so-called domain predicate which characterises the arguments for which the function terminates. Each equation of the function is guarded by an assumption that the predicate holds for the argument. In Sect. 7, we will show that the domain predicate holds for the context in which the function is called in. Before we go into more detail on how the termination is proved, we discuss the definition of the function, as shown below.

figure bo

The purpose of the function is to determine whether we can expand a given branch to a closed tableau. As stated before, we first use linear expansion rules in order to prevent premature branching; to this end, we recursively expand the branch with linear rules until the branch is linearly saturated. Note that we use Hilbert’s \(\varepsilon \)-operator in the form of Footnote 1 to choose some rule that actually adds new formulas to the branch. As soon as the branch is linearly saturated, we terminate if the branch is closed as the second equation shows. Otherwise, we choose an applicable branching rule and recursively check whether all newly created branches can be closed. The final equation applies once no further branch expansion is possible, in which case we just test for closedness of the branch.

The procedure then calls with a singleton branch [\(\phi \)] to determine the satisfiability of a given formula \(\phi \).

Thus, we use is only on branches that result from applying the expansion rules. We call this kind of branch well-formed. In the definition below, the expression denotes that b’ is one of the branches that results from applying (potentially zero) expansion rules to b.

figure bv

We use this notion in Sect. 7 to state an upper bound for the cardinality of well-formed branches. The upper bound justifies the termination of the decision procedure. Before we come to that, though, we prove soundness and completeness in Sect. 6 and 5, respectively. In Sect. 7, we also show that both properties easily transfer to , which, together with termination, establishes that it is a decision procedure.

5 Completeness of the Calculus

For completeness of the calculus, we need to show that every unsatisfiable formula has a closed tableau or, conversely, that the formula is satisfiable if there is a saturated and open branch in the tableau. To facilitate inductive reasoning, we show a stronger statement by constructing a model M such that M \(\models \phi \) for all . At the core of the model, there is a realisation function that maps set terms to sets of type hf. A subset of the witnesses, which we call pure witnesses, receives special treatment from the realisation function for reasons that will become apparent in Sect. 5.1. The collection of set terms of a branch can thus be partitioned into two collections, as defined below.

figure by

We aim to construct a syntactic model that we derive from the membership literals in the branch. To this end, we construct a graph whose vertices are the disjoint union of the sets above and there is an edge from s to t in the graph if, and only if, is in b. Note that we use Noschinski’s graph library [22] which represents a graph as a record of vertices, arcs (directed edges), and two functions and that map an arc to its source and target vertex, respectively.

figure cd

The realisation function is defined relative to this graph. As mentioned before, the realisation function treats the pure witnesses differently than the rest of the set terms. The function evaluates terms in the latter set in accordance to the structure of the graph, i.e. the realisation of a vertex is defined as the union of the realisations of the parent vertices. For the former set, we choose a function I that assigns the pure witnesses pairwise distinct sets with cardinality greater than that of the vertices. We can always choose such a function since we assume an infinite universe of variables. Then, we return the singleton set , which, together with the cardinality constraint, guarantees that realisations are distinct between pure witnesses themselves as well as between pure witnesses and set terms. The notation in the definition below indicates that there is an edge from u to s in the graph G.

figure cg

Again, we need to ensure that the assumptions of the equations are non-overlapping and that the function terminates. The former is taken care of by automation, leaving us to prove termination. The assumption that b is open implies that there are no membership cycles, thus is acyclic. Furthermore, the graph is finite by definition. Thus, we can use the cardinality of the set of ancestors as a measure that decreases in each recursive call.

Before we prove that the realisation function constitutes a model in Sect. 5.2, we will first explain the significance of the pure witnesses.

5.1 Characterisation of the Pure Witnesses

Recall that the pure witnesses of a branch b are those witnesses that are not related to other subterms in by equality. In the context of a well-formed branch, we can strengthen this characterisation to any set term and, in addition, we also get that there is no membership literal where a pure witness is on the right-hand side. Intuitively speaking, the realisation of a pure witness does not depend on the realisation of any other set term.

figure cj

So why are pure witnesses treated differently? According to the definition of , it would evaluate the pure witnesses would to the empty set , were they not treated separately. To see that this is a problem, consider the branch which expands to several open and saturated branches, one of which is

figure cn

for some fresh x and y. Assigning both and a value of 0 would contradict the literal . To prevent this, we assign the pure witnesses pairwise different values.

The proof of lemma_2 is more technical than interesting so we refer the reader to the formalisation.

5.2 Realisation of an Open Branch

Remember that for completeness, we need to show that the realisation function for an open and saturated branch b actually constitutes a model for all formulas in the branch. We start by verifying that the realisation function models all literals in the branch; more formally, the following propositions hold:

  1. (1)

    We have if it holds that is in b.

  2. (2)

    We have if is in b.

  3. (3)

    We have if is in b.

  4. (4)

    We have if it holds that is in b.

To illustrate the usefulness of lemma_2, we prove Proposition (2). The proofs of all propositions translate well into Isabelle, so we refer to the original paper [9] for the remaining proofs.

Proof

(Proof of Proposition (2)). Assume that is in b. If there exists a where or , we arrive at a contradiction due to lemma_2. Therefore, both and must hold. Now, assume for contradiction that . Without loss of generality—the other case is symmetric—we obtain an e such that and . Considering that and the definition of , we obtain a d with and . This, in turn, yields that must be in b. Together with the assumption and the saturation of b, it follows that must also be in b. But then we have using Proposition (1), which is a contradiction to the assumption .

We now lower the results on literals to set terms. All of the proofs are straightforward so we refer the reader to the formalisation.

  1. (a)

    It holds that .

  2. (b)

    Let . If the term occurs in , then

    figure dv
  3. (c)

    If , then

    figure dx

The final step for obtaining a proper model is to connect the realisation function to the semantics as defined in Sect. 2. For set terms, we can use the Propositions (a)–(c) to prove the lemma below by induction on t.

figure dy

Lifting the above result to formulas yields the coherence of b, as the original paper [9] calls it. The proof is a tedious but straightforward induction on the the size of the formulas.

figure dz

The coherence property finishes the proof of completeness of the calculus as it gives us a model for every formula in an open and saturated branch.

6 Soundness of the Calculus

A tableau calculus is sound if the corresponding formula is unsatisfiable for any closed tableau. We prove the following two properties to establish soundness:

  1. (1)

    It is impossible to satisfy all formulas in a closed branch simultaneously.

  2. (2)

    The expansion rules maintain satisfiability.

We formalise the first property in Isabelle below.

figure ea

Proof

It is clear that, for any s, neither does M model nor . Furthermore, no model can satisfy both \(\phi \) and at the same time. Lastly, a membership cycle is impossible since the membership relation of is well-founded.

We are left with showing that both linear and branching expansion rules preserve satisfiability. As for the linear rules, a straightforward proof by case analysis on suffices to obtain the lemma below.

figure eg

A similar argument would work for the branching rules if it were not for the last rule adding new variables. Those variables need to be assigned specific values; hence, we modify the model as shown in the proof below.

figure eh

Proof

We only consider the case where was proved by applying the last branching expansion rule to for some s and t. We have

figure ek

for some fresh variable x. Since is in b, we have that because M is a model. Without loss of generality, this inequality manifests itself through some y with and . We update M to map x to y to obtain the assignment M’. Note that M’ is still a model for formulas in b because x is fresh with respect to b. Furthermore, it is also a model for the first branch in , which finishes the proof.

7 Total Correctness of the Decision Procedure

We first demonstrate the termination of the procedure for well-formed branches, i.e. every well-formed branch is in the domain of . To this end, we derive an upper bound for the number of distinct formulas in a branch whose proof we omit here for brevity. We should point out that this bound is not to be construed as the complexity of the procedure as it may create exponentially many branches in general.

figure er

Remember that only applies a linear expansion rule to a branch if the application results in new formulas. Moreover, the subsumption conditions of the branching expansion rules ensure that each of the newly created branches contain new formulas. Ultimately, we conclude that the procedure must terminate for well-formed branches because the number of formulas increases in each step but is also bounded.

figure et

The above lemma allows us to utilise the computation induction rule of on well-formed branches, which we use to prove soundness and completeness. As both proofs are essentially an application of soundness, respectively completeness, of the calculus, we refer the reader to the formalisation.

figure ev

To finish the proof of total correctness, note that every singleton branch is trivially well-formed; thus, termination, completeness, and soundness easily transfer to .

figure ex

8 Dealing with Urelements

In the introduction, we stated the goal of integrating as a tactic into Isabelle. For this to work, we must map every branch expansion rule to a corresponding theorem in Isabelle/HOL. This is straightforward for all expansion rules except for the last branching expansion rule. To illustrate, suppose that we are to disprove a statement of the form

figure ez

in Isabelle/HOL. By way of reification, we convert this to a formula of the shape

figure fa

in our set syntax for some s’, t’, A’, and B’. When we apply the decision procedure to this formula, it might return a tableau proof that contains an application of the last branching rule to . This results in two branches, one of which is ; however, there is no matching rule in Isabelle/HOL since s and t are not sets.

Fig. 1.
figure 1

The type system for set terms and atoms.

To deal with this problem, we formalise a lightweight type system as displayed in Fig. 1. The type of a set term in this system is just a natural number which we call level. Intuitively speaking, the level l means that the corresponding term t in Isabelle/HOL has type

$$ \texttt {'a}\ \underbrace{\texttt {set}\ \ldots \ \texttt {set}}_{\texttt {l}\ \text {times}} $$

for some ’a. Note that the constructor \(\emptyset \) now receives an additional argument indicating the level of each instance of \(\emptyset \).

Moreover, the typing judgement extends to set atoms by matching up the levels of its component set terms.

Ultimately, we define in order to type formulas.

We can now define the urelements with respect to a formula. An urelement is a set term whose corresponding type in Isabelle/HOL might not be a set.

figure fe

Using this definition, we make two changes to the specification of the calculus: (1) First and foremost, we require that neither s nor t is an urelement in the precondition of the last branching expansion rule. (2) As mentioned above, we add an argument to the \(\emptyset \) constructor. This argument is only used for the typing judgement; it has no impact on the semantics.

Soundness, of course, is not affected by these changes but we have to make a few amendments to maintain completeness: (1) The first equation of now also must account for the urelements. In particular, it has to ensure that urelements receive pairwise different values unless they are related through equality atoms. This does not affect pure witnesses since they can not be related through equality atoms due to lemma_2. (2) We must adjust the completeness proof in those places where it directly refers to the definition of to account for the case where a given term is an urelement. (3) The completeness theorem receives the additional assumption that \(\varGamma \vdash \phi \) holds for the initial formula \(\phi \). (4) For the completeness proof, we must show that the typing judgement is invariant under branch expansion.

The modifications above ensure that the proof can be replayed through Isabelle/HOL. To actually use the calculus, we must determine the urelements of the initial formula \(\phi \), though. In other words, we have to implement an inference algorithm for our lightweight type system. The algorithm is, in essence, a simplified version of Hindley-Milner type inference so it has the same two phases: it generates constraints using syntax directed rules and then passes them to a constraint solver.

Since we are only interested in the level of a term, we can encode all constraints into the theory of 0, the successor function S, and equality (but no disequality). Note that constraints of the form can be replaced by with i being a fresh variable. A solver for this theory is straightforward to implement and verify; nevertheless, we have to be careful that it computes the minimum assignment \(\varGamma \) from variables to levels that fulfils the constraints. This guarantees that a set term t is not an urelement if, and only if, . Conversely, all terms s with are urelements.

9 Conclusion and Future Work

We developed a formalisation of a tableau calculus for a quantifier-free fragment of set theory called MLSS based on a paper by Cantone and Zarba [9]. The formalisation includes an abstract description of a decision procedure that builds on the calculus. To make the decision procedure compatible with Isabelle/HOL, we extended the calculus with a lightweight type system while maintaining completeness. We also refined the abstract specification to an executable specification from which code can be generated.

In future work, we plan to implement an efficient executable specification in the style of a worklist algorithm. This specification should also generate certificates that can be replayed through Isabelle’s inference kernel to facilitate the integration of the procedure into Isabelle.