1 Introduction

The theory of modal logic has long been a fruitful area when it comes to mechanisation. The proof systems are appealing, and the applications in model-checking are of clear real-world interest. It helps also that the subject domain (proof calculi and automata) are well-suited to “standard” theorem-proving technology (rule inductions and interesting data types).

There has been much less work on the model theory behind modal logic; indeed even in first order logic, most developments concern themselves only with model theory inasmuch as it is required to show completeness of an accompanying proof system. As our experience demonstrates, it is also clear that modern theorem-proving systems are not necessarily so well-suited to the mathematics behind model theory. Harrison [5] complained in 1998 that the very notion of validity is awkward to capture in HOL, and our own work shows up further failings in simple type theory.

Nonetheless, there is much interesting mathematics to be found even in the early chapters of a standard text such as Blackburn et al. [1]. The fact that mechanising only as far as [1, Chapter 2] requires what we believe to be the first mechanisation of the notion of ultraproduct (ultimately leading to Łoś’s theorems and other results), is a strong suggestion that we are exploring novel mathematical ground for interactive theorem-proving systems.

Contributions. This paper presents the first mechanised proofs of a number of basic results from the first two chapters of Blackburn et al. [1] (e.g., bounded morphisms, bisimulations and the finite model property via selection), as well as

  • Two versions of Łoś’s theorem on the saturation of ultraproduct models;

  • modal equivalence as bisimilarity between ultrafilter extensions; and

  • a close approximation of Van Benthem’s Characterisation Theorem.

We also discuss where HOL’s simple type theory lets us down: some standard results (including the best possible statement of Van Benthem’s Characterisation Theorem) seem impossible to prove in our setting.

HOL4 Notation. All of our theorems have been pretty-printed to from the HOL theory files. We hope that most of the basic syntax is easy to follow. In a few places we use to denote the arbitrary choice of an element from set (appealing to the Axiom of Choice). The power-set of a set is written . In a number of places, we use HOL’s “itself” type to allow us to explicitly mention a type via a term. The type has just one value inhabiting it for any given choice of ; that value is written .

Source Availability. Our HOL4 sources are available from GitHub at https://github.com/u5943321/Modal-Logic

The sources build under HOL4 commit with SHA 03829d8986f.

2 Syntax, Semantics and the Standard Translation

In our mechanisation, we consider the basic modal language, in which the only primitive modal operator is the ‘\(\Diamond \)’. A modal formula is either of form , where p is of type , enumerating all the possible variable symbols, a disjunction (pretty-printed to in most places), the falsity , a negation , or, finally, of the form . We define a data type called to represent the formulas of this modal language.

Definition 1

[1, Definition 1.9]

figure q

If we wanted to consider modal operators with any arity, we should change the last constructor of modal formulas so it takes two parameters: a natural number indexing the modal operator, and a list of modal formulas. This would in turn require a well-formedness predicate to be defined over formulas to make sure that modalities were applied to the right number of arguments.

A model where these formulas can be interpreted consists of a frame and a valuation, where a is a -set with a relation on it, and a model adds valuations for the variables present at each world:

Definition 2

[1, Definition 1.19]

figure t

In the rest of the paper, the field \(.\mathsf valt\) of a model will be called the valuation, and , and are used to denote the world set, the relation, and the valuation of respectively. The interpretation of modal formulas on a model is given by the predicate satisfaction. We read ‘ ’ as ‘\(\phi \) is satisfied at the world w in ’.

Definition 3

[1, Definition 1.20]

figure ac

By requiring in various clauses above, we ensure that models’ world sets must be non-empty if they are to satisfy any formulas.

Two worlds and are modal equivalent (written ) if they satisfy the same set of modal formulas. If are modal formulas, we say they are equivalent over models (written ) if they are satisfied in the same worlds in every model:

Definition 4 (Notions of equivalence)

figure ak

We cannot omit the type parameter in the definition, as there would otherwise be a type, namely the type of the underlying set of the models we are talking about, that only appears on the right-hand side but not on the left-hand side of the definition. HOL forbids such definitions for soundness reasons. Also, HOL does not permit quantification over types, so it is impossible to write , with \(\mu \) a type. Therefore, this definition is not exactly encoding the equivalence in the usual sense: when we mention equivalence of formulas in usual mathematical language, we are implicitly referring to the class of all models, but the constraint here bans us from talking about all models of all possible types at once.

A modal formula can be translated into a first-order formula via the standard translation. To mechanise this translation, we build on Harrison’s construction of first-order logic [5]. The first-order connectives are decorated with an \(\mathsf f\). A first order model is a set with interpretation of function symbols and predicate symbols . A valuation of is a function that maps all the natural numbers into the domain of . If a first-order formula is satisfied in a first-order model with a valuation assigning free variables of elements in the domain of , we write .

For a modal formula , is the standard translation of using as the only free variable that may occur:

Definition 5

[1, Definition 2.45 (Standard Translation)]

figure be

As one would expect, we translate into an existential formula. To ensure we use a fresh variable, we use \(x+1\) as our new variable symbol in this clause. The standard translation gives a first-order reformulation of satisfaction of modal formulas:

Proposition 1

[1, Theorem 2.47 (i)]

figure bg

Here is the function that turns a modal model into a first-order model, defined as:

figure bi

That is, the model obtained by converting a modal model has domain , maps every term into an arbitrary world, maps each propositional letter to distinct predicates on worlds, and uses one binary predicate (the “0th predicate”) to encode the frame relation.

3 Basic Results

We discuss some highlights of mechanised results from Blackburn et al. [1, §2.1–§2.3] below.

3.1 Tree-Like Property

A tree-like model is a model whose underlying frame is a tree. If , a frame, is also a tree with root r, we write :

Definition 6

[1, Definition 1.7]

figure bo

The tree-like property says each satisfiable modal formula can be satisfied in a tree-like model:

Proposition 2

[1, Proposition 2.15]

figure bp

The world set of the tree-like model constructed from is a set of lists of worlds in (such lists are effectively paths from the root to various positions within the tree). Thus, passing to a tree-like model does not preserve the model type. The tree-like lemma is used to prove the finite model property via selection afterwards.

3.2 Bisimulation

Though apparently verbose, the definition of bisimulation in HOL is straightforward.

Definition 7

[1, Definition 2.16 (Bisimulations)]

figure bs

It is trivial to prove by induction that bisimilar worlds are modal equivalent. As the most significant theorem on the basic theory of bisimulations, we proved the Hennessy-Milner theorem, which states that modal equivalence and bisimulation on image-finite models are the same thing. An image-finite model is a model where every world can only be related to finitely many worlds. In HOL, we get:

Theorem 1

[1, Theorem 2.24 (Hennessy-Milner Theorem)]

figure bt

Bisimulation is an interesting topic in modal logic. Three other significant theorems on bisimulations (including an approximation of Van Benthem Characterisation theorem) are discussed later.

3.3 Finite Model Property

There are two classical approaches to constructing finite models using model theory, namely via selection and via filtration. The complicated one is the former: Given , where \(\phi \) has degree k, we can construct , and consecutively, such that is the finite model we want, where:

  • is the tree-like model obtained from Proposition 2 with root such that .

  • is the restriction of to height k.

  • is obtained from by modifying the valuation into where is the set of all propositional letters used by .

The construction of requires a lemma:

Lemma 1

[1, Proposition 2.29]

figure ck

The proof of Lemma 1 further relies on the following fact: Given a set of modal-formulas that is finite up to equivalence, if we combine the elements of using only connectives other than , then we get only finitely many non-equivalent formulas. To show this, we prove that there is an injection from the set of equivalence classes of such combinations to a finite set. For the antecedent of Lemma 1, we require the assumption that the universe of \(\beta \) is infinite since we rely on the fact that two modal formulas and are equivalent if and only if and are equivalent. This would be easy to prove in set theory. However, in simple type theory, the proof of iff requires us (in the right-to-left direction) to be able to construct a model with a new world inserted, something only sure to be possible if the universe is infinite. As the construction used Proposition 2, we change the type of the model by passing to a finite model via selection:

Theorem 2

[1, Theorem 2.34 (Finite model property, via selection)]

figure cv

We also mechanised the filtration approach, but omit the details for lack of space. The advantage of filtration is that the resulting finite model is over worlds of the same type as in the starting model.

All the results proved above can be captured using models everywhere. If one takes to be (or any infinite type) in Theorem 2, one can also exploit the fact that numbers and lists of numbers have the same cardinality to derive a finite model result that preserves the “input type”.

4 Mechanising Ultrafilters and Ultraproducts

A number of results in Blackburn et al. [1, §2.5–§2.7] rely on theorems about ultrafilters and ultraproducts.

4.1 Ultrafilters

Given a non-empty set J, a set is a filter if it contains J itself, is closed under binary intersection, and is closed upward.

Definition 8

[1, Definition A.12 (Filters)]

figure da

We call L a proper filter if L is not the whole power set. An ultrafilter is a filter U such that for every \(X\subseteq J\), exactly one of X or \(J\setminus X\) is in U. Intuitively, subsets in an ultrafilter are considered as ‘large’ subsets of .

The ultrafilter theorem states that every proper filter is contained in an ultrafilter:

Theorem 3

[1, Fact A.14, first half]

figure de

(The proof uses Zorn’s Lemma.)

A subset A of the power set on J has finite intersection property if once we take the intersection of a finite, nonempty family in A, the resultant set is nonempty.

Definition 9

[1, Definition A.13 (Finite Intersection Property)]

figure df

As a corollary of ultrafilter theorem, a set with finite intersection property is contained in an ultrafilter.

4.2 Ultraproducts

The notion of ultraproducts is defined for sets, modal models, and first-order models.

Ultraproduct of Sets. A family of sets indexed by is a function in HOL. For , is the set indexed by . Given a family indexed by a non-empty set such that each is non-empty, the ultraproduct of is defined as a quotient of the cartesian product of the family.

Definition 10

[1, Page 495 (Cartesian product)]

figure dp

If U is an ultrafilter on J, for two functions fg in the Cartesian product , we say f and g are U-equivalent (notation: ) if the set (where the values of f and g agree) is in U. The ultraproduct of modulo U is the quotient of by .

Definition 11

[1, Definition 2.69 (Ultraproduct of Sets)]

figure dw

We write to denote the equivalence class that f belongs to. In the case where for all \(j\in J\), the ultraproduct is called the ultrapower of modulo U.

Ultraproduct for Modal Models. Given a family of modal models indexed by J and an ultrafilter U on J, the ultraproduct model of modulo U (notation: ) is described as follows:

  • The world set is the ultraproduct of world sets of modulo U.

  • Two equivalence classes \(f_U,g_U\) of functions are related in iff there exist \(f_0\in f_U,g_0\in g_U\), such that is in U.

  • A propositional letter p is satisfied at \(f_U\) in iff there exists \(f_0\in f_U\) such that is in U.

Definition 12

[1, Definition 2.70 (Ultraproduct of Modal Models)]

figure ei

As is an equivalence relation, if one element in an equivalence class satisfies the required condition, then all the elements in the equivalence class will satisfy the condition. Therefore, if we replace all the existential quantifiers with universal quantifiers in the above definition, the construction is still valid, and will give the same model as the current definition.

The critical result we need about ultraproducts of modal models is a modal version of the fundamental theorem of ultraproducts, also known as Łoś’s theorem.

Theorem 4 (Łoś’s theorem, Modal version)

figure ek

According to our intuition about ultrafilters, we can gloss this theorem to mean that the ultraproduct of a family of modal models satisfies a modal formula if and only if ‘most of’ the models in the family satisfy the formula. Though it is possible to derive this result from Łoś’s theorem of first-order models using the standard translation, our proof is direct, by structural induction on .

Ultraproducts for First-Order Models. Given a family of first-order models indexed by J and an ultrafilter U on J, the ultraproduct model of modulo U (notation: ) is given by:

  • The domain is the ultraproduct of the domains of over U on J.

  • A function named by symbol (natural number) n sends a list of equivalence classes to the equivalence class of a function that sends \(j\in J\) to , where the k-th member of the list l is a representative of the k-th member (which is an equivalence class) of .

  • A predicate named by symbol p will hold for a list of equivalence classes if and only if when we have a list , where k-th member is a representative of the k-th member of zs, the set of elements such that is in U.

Definition 13

[1, Definition A.18 (Ultraproduct of First-Order Models)]

figure ex

Here we fix the representative of each equivalence class to be . Therefore, as described above, the functions and are:

figure fc

The semantic behavior of ultraproduct models is characterised by Łoś’s theorem: for the ultraproduct of a family of first-order models over an ultrafilter on J, a formula is satisfied under a valuation if and only if the set indexing the models in the family where is true under the valuation is in the ultrafilter U.

Theorem 5

[1, Theorem A.19 (Łoś’s theorem)]

figure fk

where means the functions of never map a list out of the domain of .

5 Ultrafilter Extensions

The first application of the theory of ultrafilters above is to construct the ultrafilter extension of a model, which has the nice property of being modally saturated (m-saturated hereafter). To define m-saturation, we give the following three definitions (the first two are called finitely satisfiable, satisfiable) consecutively:

Definition 14

[1, Definition 2.53]

figure fo

For m-saturated models, bisimulation and modal equivalence coincide:

Proposition 3

[1, Proposition 2.54]

figure fp

Given a model and a set of worlds of , the set of worlds that ‘can see’ X (notation: ) is the set of worlds of such that there exists some such that . We define the ultrafilter extension of as:

  • The world set is the set of all ultrafilters on .

  • Two ultrafilters , on are related in the ultrafilter extension of if for every , the set of worlds that can see is in .

  • A propositional letter p to be satisfied at an ultrafilter v if and only if the set of worlds in which satisfies p is in v.

In HOL:

Definition 15

[1, Definition 2.57 (Ultrafilter Extension)]

figure gj

Using the ultrafilter theorem and some basic properties about ultrafilters, we derive:

Proposition 4

[1, Proposition 2.59 (i)]

figure gk

In particular, every world is embedded as the principal filter on generated by in the ultrafilter extension or . Also, the above leads to the proof of the fact that the ultrafilter extension of every model is m-saturated. The m-saturatedness of ultrafilter extensions together with Proposition 3 immediately gives the central result about ultrafilter extension: bisimilarity of worlds in a model can be characterised as bisimilarity in .

Theorem 6

[1, Proposition 2.62]

figure gs

6 Countable Saturatedness of Ultrapower Models

Given a first-order model with no information about interpretation of its function symbols, we can expand the model by adding an interpretation of some function symbols. For our purpose, we are only interested in adding the interpretation of finitely many nullary function symbols, also called constants. We write to denote the model that is the result of adding each element in A to as a new constant. Further, the function f is a bijection between \(\{0,\cdots ,n-1\}\) and A, which is assumed to be finite, so that each nullary function symbol c will be interpreted as in .

Definition 16

[1, Definition A.9 (Expansion)]

figure gz

As is apparent from the definition, the only difference between a model and its expansion is the interpretation of function symbols.

A set of first-order formulas is called consistent with a model if for every finite subset , there exists a valuation of such that all elements of are satisfied, in this case, we write . A set of first-order formula is an x-type if for each formula in , the only free variable that may contain is x. In this case, we write ‘ ’ in HOL. If is an x-type, when evaluating formulas in , the valuations will only control where the only free variable x goes to. We say is realised in if there is an element w in the domain of such that for all . In this case, we write ‘ ’ in HOL. Let be a model and be a natural number. If for every with \(|A|<n\) and every \(f:{\mathbb N}\rightarrow \) , the model realises every x-type that is consistent with , then we say is n-saturated. In HOL:

Definition 17

[1, Definition 2.63 (n-Saturated)]

figure hz

We say is countably saturated if is n-saturated for every natural number n. The ultimate goal is to prove a lemma to be used in the proof of Van Benthem characterisation theorem: For a family of non-empty models, their ultraproduct on a countably incomplete ultrafilter is countably saturated.

Lemma 2

[1, Lemma 2.73]

figure ic

Here a countably incomplete ultrafilter is an ultrafilter that contains a countably infinite family that intersects to the empty set. We prove in HOL that such ultrafilters do exist using Theorem 3. The above theorem is not simply a direct consequence of Łoś’s theorem: that result is about ultraproducts of first-order models, and it says nothing about expansion. But to prove Lemma 2, we must prove a statement for an expanded first-order model, and this first order model is itself obtained by converting a ultraproduct of modal models.

To deal with this issue, the key observation is that constants are nothing more than forcing some symbols to be sent to some points in a model under every valuation, hence rather than use nullary function symbols, we fix a set of variable letters, each corresponding to a function symbol, and only consider the valuations that send these variable letters to certain fixed points. With this idea, we can remove all the constants in a formula, and hence change our scope from an expanded model back to the unexpanded model. To get rid of the constants \(\{0,\cdots , n-1\}\), we replace every with , and replace every constant by . This operation is done by the function which takes a natural number (the number of constants we want to remove), and a first-order formula (where the only function symbols may appear are the constants \(0,\cdots , n-1\)). Since \(0,\cdots ,n-1\) in a shifted formula are now designed to be sent to fixed places \(f \ 0,\cdots ,f \ (n-1)\), it does not make sense to assign these variable symbols anywhere else. Therefore, to talk about evaluation of shifted formula, the first thing is to make sure that the valuations we are considering send the variables which actually denote constants to the right place. Hence we shift the valuations accordingly, and then prove that a formula is satisfied on an expanded model is satisfied under a valuation if and only if the shifted formula is satisfied under the shifted valuation. Also, we prove that ‘taking the ultraproduct first-order model commutes with the convertion from modal to a first-order model on certain formulas’, in the sense that the resulting models satisfies the same first-order formulas without function symbols. By putting these two results together, we prove Lemma 2 using the proof in Chang and Keisler [3].

7 Van Benthem’s Characterisation Theorem

Note that the standard translation of any modal formula can only contain unary predicate symbols which correspond to propositional letters, one binary predicate symbol which corresponds to the relation, and no function symbols. A first-order formula which only uses these symbols is called an \({\mathcal L}_{\tau }^1\)-formula. An \({\mathcal L}_{\tau }^1\)-formula which contains only one free variable is called invariant under bisimulation if for all models and with and , if there exists a bisimulation relation between and relating w and v, then \(\phi \) holds at if and only if it holds at when both and are viewed as first-order models.

Definition 18

[1, Definition 2.67 (Invariant for Bisimulations)]

figure is

Because of the same problem we met when defining equivalence of formulas, the type parameters are necessary here. However, although it is possible to prove theorems for different types \(\alpha \) and \(\beta \) in the above definition, in the theorems to come, we will only consider the case where \(\alpha \) and \(\beta \) are the same.

The Van Benthem characterisation theorem says an \(\mathcal L_{\tau }^1\) formula with at most one free variable x is invariant under bisimulation precisely when it is equivalent to the standard translation of some modal formula at x. It is immediate from Proposition 1 that every such formula which is equivalent to a standard translation is invariant for bisimulation. We cannot prove it as an ‘if and only if’ statement, since according to the proofs in [1], we can only prove the two directions separately as:

Proposition 5

[1, Theorem 2.68, as two separate directions]

figure it

which cannot be put together into a double implication. To see the reason: given an \(\mathcal L_{\tau }^1\)-formula \(\phi \) with no more then one free variable, by the second theorem above, if is invariant under bisimulation for models with -worlds, then \(\phi \) is equivalent to a standard translation on a model with \(\alpha \)-worlds. However, if we want to prove the converse of this statement, we need to start with the assumption that \(\phi \) is equivalent to a standard translation on models with \(\alpha \)-worlds, and prove that \(\phi \) is invariant for bisimulation for models with -worlds. But by the first theorem above, we can only conclude \(\phi \) is invariant for bisimulation for models of type \(\alpha \). The point is that it is not the fact that all our desired operations can be taken within a type. In particular, we cannot take ultraproducts of models and preserve cardinalities. The cardinality of the type universe of is too large to be embedded into \(\alpha \), so we cannot just fix the ‘base type’ to be and get an ‘if and only if’ statement-we cannot derive \(\phi \) is invariant for bisimulation for models with -worlds from the fact that \(\phi \) is invariant for bisimulation for models with \(\alpha \)-worlds. If we could quantify over types (as we could in a theorem prover based on dependent type theory), then we can could define ‘invariant under bisimulation for models of every type’, and hence prove the original statement of Van Benthem characterisation theorem.

For the proof of the two theorems above, the first one is immediate from Proposition 1, and the second one requires another critical lemma saying ‘modal equivalence between two worlds implies bisimilarity of the two worlds when embedded in some other models’. More precisely, if two worlds and are modal equivalent, then we can find an ultrafilter U on J such that in ultrapower models of and on U respectively, there is a bisimulation between the worlds corresponding to and .

Theorem 7

[1, Theorem 2.74, one direction]

figure jg

The proof of the above relies on Lemma 2.

8 Conclusion

To summarise, we have mechanised all of the results (appearing as propositions, lemmas and theorems) in the first two chapters in Blackburn et al. [1] that can be captured by the HOL logic, and which are about the basic modal language. The exceptions are:

  • The result in Sect. 2.6 about ‘definability’, which requires a definition of the ‘models closed under taking ultraproducts’. Simple type theory cannot capture such large sets.

  • The result about ‘safety’ in Sect. 2.7 is a result about the PDL language, which has infinitely many modal operators. For the moment, we have restricted our attention to the basic modal language, with only \(\Diamond \) (and the derived \(\Box \)).

The two characterisation theorems from Blackburn et al. [1], namely Theorem 2.68 (Van Benthem’s Characterisation Theorem) and Theorem 2.78, are the only two mechanised theorem such that translating the ‘if and only if’ statements from set theory into simple theory does not yield an ‘if and only if’ statement. Blackburn et al.’s proof of Theorem 2.78 has the same pattern as Van Benthem’s Characterisation Theorem (discussed earlier), and is less complicated.

For each of the mechanised definitions and results, we write the statement in HOL to be as close as possible to the original statement in [1]. We believe that this makes it as easy as possible for people who are interested in mechanising other results in [1] to continue with our work as a starting point. The work on ultraproducts up to Łoś’s theorem is independent of our work on modal model theory, and should be generally useful in other model-theoretic applications.

8.1 Related Work

We believe that we are the first to mechanise the bulk of the results in this paper. Of course, much work has been done in this and similar areas. For example, de Wind’s thesis [7] is a notable early mechanisation of modal logic, mainly focusing on proving the validity of modal formulas via natural deduction. Of similar vintage is Harrison’s mechanisation of foundational results about first order model theory [5], in particular compactness. We used this mechanisation directly in our own work. A great deal of work has also been done in the mechanisation of first order proof theory, such as the recent pearl by Blanchette et al. [2], showing completeness in elegant fashion.

The connections between modal logic and process algebra are well-understood and there has been a great deal of mechanised work on the operational theory of such (co-)algebraic systems, starting at least as far back as Nesi [6]. Our proof of the Hennessy-Milner theorem (Theorem 1) is a gesture in this direction, but Van Benthem’s theorem is much deeper and uses bisimulations as a tool to understanding the connection between modal and first order logics, rather than as a connection to process algebras.

Mechanised work with ultrafilters began with Fleuriot’s use of them to mechanise non-standard analysis [4]. We are unaware of any previous mechanised use of ultraproducts or ultrapowers.