1 Introduction

  (for Natural Proof Checking) is an emerging natural proof assistant that accepts input in a controlled natural language, approximating ordinary mathematical language and texts. The system uses

  • the dedicated input language ForTheL (Formula Theory Language),

  • natural language processing for texts with symbolic material,

  • strong automatic theorem proving (ATP) for filling in implicit or obvious proof steps.

The current version of also introduces a dialect of ForTheL so that high-quality mathematical typesetting is readily available. allows the formalization and proof-checking of advanced mathematics in a style that is immediately readable by mathematicians. Example formalizations from various domains of undergraduate mathematics are included.

  ships as a component in the latest release of the Isabelle prover platform [8]. When editing a ForTheL file in Isabelle/jEdit Prover IDE (PIDE), there is an auxiliary server in the background to quickly answer requests for checking ForTheL texts, with an internal cache to avoid repeated checking of unchanged text segments. The implementation uses programming interfaces of Isabelle/PIDE that allow user-defined file formats to participate in the concurrent document model. A second auxiliary server allows the program to run external prover processes under the control of Isabelle, with explicit timeouts. This works reliably on the usual platforms (Linux, Windows, macOS) by re-using external provers of Isabelle/Sledgehammer [17]. From the perspective of logic, there is no connection of with Isabelle/Sledgehammer or any other Isabelle/HOL tools.

In this paper we briefly discuss the need for natural proof assistants, provide some general information on Isabelle/Naproche, and give an overview of methods employed in the system, using an excerpt from a formalization of Euclid’s infinitude of primes as a running example. To conclude we compare to other projects in formal mathematics with natural language input and indicate ways to further extend ’s naturalness and efficiency.

2 Natural Proof Assistants

While state-of-the-art interactive theorem provers have been successfully used to prove and certify highly non-trivial research mathematics, they are still, according to Lawrence Paulson [16] “unsuitable for mathematics. Their formal proofs are unreadable.”

Natural proof assistants intend to bridge the wide gap between intuitive mathematical texts and the formal rigour of logical calculi. We propose the following criteria for natural proof assistants:

  • Input languages should be close to the mathematical vernacular, including support for common grammatical conventions and symbolic expressions. These languages should support familiar text structurings, such as the usual definition-theorem-proof style.

  • Proofs should consist of natural argumentative phrases for various proof tactics, allowing for a more declarative style.

  • The system should use familiar logics and mathematical ontologies.

  • Tedious details and obvious proof gaps should be filled in automatically.

  • An intuitive editor should allow for interactive text and theory development, where incremental proof checking can guide the formalization.

We expect that naturalness will be crucial for the adoption of formal mathematics by the wider mathematical community. This is in line with some ongoing large-scale projects in formal mathematics. For instance, the ALEXANDRIA project by Paulson [16] stipulates:

ALEXANDRIA will be based on legible structured proofs. Formal proofs should be not mere code, but a machine-checkable form of communication between mathematicians.

The Formal Abstracts project of Thomas Hales [5] intends to

  • give a statement of the main theorem of each published mathematical paper in a language that is both human and machine readable,

  • link each term in theorem statements to a precise definition of that term (again in human/machine readable form).

3 Isabelle/Naproche

The proof assistant stems from two long-term efforts aiming towards naturalness: the Evidence Algorithm (EA) and System for Automated Deduction (SAD) projects at the universities of Kiev and Paris [14, 15, 20, 21], and the Naproche project at Bonn [1,2,3, 10]. extends the input language ForTheL of SAD and embeds it into , allowing mathematical typesetting; the original proof-checking mechanisms of SAD have been made more efficient and varied.

The first experimental integration of the then Naproche-SAD prover into the Isabelle Prover IDE was done in 2018 by Frerix and Wenzel [23, §1.2]. The current (refined and extended) version has now become a bundled component of Isabelle2021 [8]. After downloading and unpacking the Isabelle distribution, Isabelle/Naproche becomes immediately accessible in the Documentation panel, section Examples, entry $ISABELLE_NAPROCHE/Intro.thy. Isabelle and its add-on components work directly without manual installation, but this comes at the cost of substantial resource requirements: on Linux the total size is 1.2 GB, which includes Java 15 (330 MB), E prover 2.5 (30 MB), and (20 MB). The bulk of other Isabelle components are required for Isabelle/HOL theory and proof development, but has no logical connection to that.

The prover is invoked automatically when editing ForTheL files with .ftl or .ftl.tex extensions. Further examples and an introductory tutorial are linked in the Isabelle theory file $ISABELLE_NAPROCHE/Intro.thy: as usual for Isabelle/jEdit and other IDEs, following a link works by a mouse click combined with the keyboard modifier CTRL (Linux, Windows) or CMD (macOS). The examples deal with results from undergraduate number theory, geometry, and set theory; most are available in the classic ASCII style as well as in style and typeset in PDF.

The ForTheL library FLib [13] contains a variety of formalizations for earlier versions of . Some substantial texts have been written as undergraduate student projects and cover, e.g., group theory up to Sylow theorems, initial chapters from Walter Rudin’s Analysis, or set theory up to Silver’s theorem in cardinal arithmetic. These texts will soon be upgraded to the new version of and included in an interlinked formalized library of readable and proof-checked mathematical texts.

4 Formalizing in ForTheL

4.1 Example

The following screenshot shows a proof of the infinitude of prime numbers in the Isabelle/Naproche Prover IDE taken from the bundled tutorial which itself is a proof-checked ForTheL text:

figure w

The editor buffer contains the ForTheL source, which also happens to conform to standard format. (The “Contradiction” lemma, now deactivated by a %, is a left-over of a typical check for hidden inconsistencies in the axiomatic setup.) The Output panel contains feedback from the prover about the source document: “verification successful” and some statistics; the most relevant messages are also shown in-line over the source as squiggly underline with popup on mouse-hovering. The Sidekick/latex structure overview is provided by standard plugins of the underlying text editor. This piece of mathematics is typeset by as follows:

figure aa

4.2 The ForTheL Language

The mathematical controlled language ForTheL has been developed over several decades in the Evidence Algorithm (EA) / System for Automated Deduction (SAD) project. It is carefully designed to approximate the weakly typed natural language of mathematics whilst being efficiently translatable to first-order logic. In ForTheL, standard mathematical types are called notions, and these are internally represented as predicates with a distinguished variable, which are treated as unary predicates with the other variables used as parameters (“types as predicates”). This leads to a flexible dependent type system where number systems can be cumulative (\(\mathbb {N} \subseteq \mathbb {R}\)), and notions can depend on parameters (subsets of \(\mathbb {N}\), divisors of n).

First-order languages of notions, constants, relations, and functions can be introduced and extended by signature and definition commands. The formalization of Euclid’s theorem, e.g., sets out like:

figure ab

5 Architecture of the System

  follows standard principles of interactive theorem proving, but with a strong emphasis on the naturalness aspects explained above. The general information processing in the system is described in the following diagram. The core program is implemented in Haskell.

figure af

In the sequel we shall describe main components of .

5.1 Tokenizing and Parsing

  uses a standard tokenizing algorithm for cutting text up into a list of meaningful tokens, with precise source positions to enable PIDE messages and markup, e.g., by colours for free and bound variables. When using syntax, the tokenizer also takes care of expanding certain commands (see the next subsection).

Parsing is carried out in Haskell’s monadic style with parser combinators. We allow ambiguous parsing, since it better fits natural language. Currently the translation into tagged first-order logic is already part of the parsing process. The following translation of our example snippet was obtained by running from the command line with the -T (translate) option:

figure al

In order to make more versatile we plan on parsing into an abstract syntax tree instead, so that different logical back-ends could translate into different logics. We have already made some experiments on translating ForTheL to Lean [12].

Moreover, with the input language growing, we shall eventually turn to some grammatical framework to speed up language development without hard-coding vocabulary or grammar rules into the code.

5.2 Processing

We have extended to support a .ftl.tex format, in addition to the original .ftl format. Files in .ftl.tex format are intended to be readable by both for logical checking and by for typesetting.

The tokenizer ignores the whole document, except what is inside forthel environments of the form

figure at

In a forthel environment, standard syntax can be used for declaring text environments for theorems and definitions.

In , users can define their own operators and phrases by defining linguistic and symbolic patterns. This mechanism has been adapted to allow constructs in patterns. In the Euclid text we use the pattern for the finite set \(\{p_1,\dots ,p_r\}\). By defining as a macro we can arrange that the ForTheL pattern will be printed in the familiar set notation:

figure ba

There are some primitive concepts in , such as the logical operators \(\vee \), \(\wedge \), \(\exists \) that are directly recognized in the source and expanded to corresponding internal tokens.

The current release of does not differentiate between math mode and text mode in , since it re-uses much of the parsing machinery of the original .ftl format. Future releases shall make such a distinction to increase the robustness of the parser, improve error messages and resolve some ambiguities in the current grammar.

5.3 Logical Processing

The first-order formulas derived from ForTheL statements are put into an internal ProofText data type consisting of blocks of formulae, arranged in a tree-like fashion. The tree structure mirrors the logical structure of a text, where a statement can be seen as a node to which a subtext, e.g., its proof is attached. Since statements in a proof can have their own subproofs this leads to a recursive tree structure, on which the further checking is performed along a depth-first left-to-right traversal.

5.4 Ontological Checking by the Reasoner

An innocent mathematical statement like \(a^2+b^2=c^2\) contains a number of implicit proof tasks, even if the whole statement is not to be proved, but part of a definition or an assumption. One has to check that abc are (numerical) terms to which the squaring operation can be applied, and that the resulting squares can be subjected to addition and equality. These checks are called “ontological”, and they roughly correspond to type checking in type-orientated systems. The situation here is however more complicated, as types (i.e. notions) and operations may involve first-order definitions with preconditions, which cannot be decided during the parsing process but only during proof-checking. So in the checking process each node of the aforementioned tree is first checked ontologically; if the node formula itself is marked as a conjecture, it is logically checked.

5.5 Logical Checking by the Reasoner

The various checks are organized by the reasoner module. In simple cases the reasoner itself can supply a proof; if not, the reasoner constructs proof tasks for the ATP. Since definitions in first-order logic are formally symmetric equivalences, they may lead to circularities in proof searches. Instead definitions are successively unfolded by replacing the definiendum by the definiens. This process may be iterated when proof attempts fail.

The ATP is given certain timeouts to search for proofs. Ontological checking is supposed to be easier than proper mathematical proving. So the default time for each ontological check is set to 1 sec, whereas proving gets 3 sec and can be iterated for several rounds of definition unfolding.

5.6 Communication with an External ATP

Proof tasks are translated into the generic TPTP first-order format for ATPs. These can be viewed in the Output window of Isabelle/jEdit, after inserting the directive [dump on] into the ForTheL source. The final proof task in checking Euclid’s proof ends with the TPTP lines:

figure bi

By default uses E prover [19] as external ATP, but one may switch to other provers available in the Isabelle distribution.

6 Integration into Isabelle

The initial integration of into the Isabelle Prover IDE happened in 2018 and is briefly reported as an example in the PIDE overview article [23] based on Isabelle2019 (June 2019). The main idea was to turn the existing Haskell command-line program into a TCP server that can answer concurrent requests for checking ForTheL texts in a purely functional manner, with proper handling of cancel messages (for interrupts caused by user editing); this required to remove a few low-level system operations, like reading physical files or exit of the process. Afterwards, the semantic operation forthel_file in Isabelle – to check ForTheL text and produce markup messages according to the PIDE protocol – was implemented as Isabelle/Isar command in Isabelle/ML as usual, but the main work is delegated to the server. Its implementation uses the Isabelle/Haskell library for common Isabelle/PIDE message formats, source positions, markup etc. – it is maintained within the Isabelle distribution.

The current version of Isabelle/Naproche refines this approach in various respects. In particular, Isabelle2021 now provides a standard mechanism for user-defined Isabelle/Scala services: this is both relevant for Isabelle command-line tools to build and test Isabelle/Naproche, and the Prover IDE support of ForTheL files to connect the Isabelle/jEdit front-end to the back-end.

Moreover, the Java process running the Prover IDE provides an additional TCP server to launch external provers that are already distributed with Isabelle (thanks to Isabelle/Sledgehammer): applications mainly use the current E prover 2.5 [19], but SPASS and Vampire are available for experiments. The existing management of processes in Isabelle/Scala involves considerable efforts to robustly support interrupts and timeouts in a concurrent environment; this works on all platforms supported by Isabelle (using special tricks for Windows/Cygwin, and macOS/Rosetta on Apple Silicon).

The documentation file $ISABELLE_NAPROCHE/Intro.thy gives further hints on implementation near the end, with hyperlinks to the sources. A lot of technical Isabelle infrastructure is re-used by Isabelle/Naproche, but there is presently no connection to Isabelle/HOL, which is a much larger and better-known application of the same Isabelle framework [18].

7 Related and Future Work

Bridging the gap between mathematical practice and fully formal methods has always been a central concern in formal mathematics. The development of the Mizar system [11] was accompanied or even driven by the stepwise adaptation of its language to standard mathematical proof methods and logical foundations. In contrast, most interactive theorem provers feature formal tactic languages, with tactics scripts that can hardly be understood without stepwise tracing and reconstructing internal logical states.

The Mizar language has been a role model for other proof languages. There are, e.g., "Mizar modes" for HOL [6, 25] and Coq [4] and the widely used Isar language for Isabelle [22, 24]. These language can be read by mathematicians, with some effort, but they retain a strong bias toward computer science customs. A survey of input languages for formalization on a scale between formal and natural can be found in [9].

Only a few formal mathematics projects have aimed at processing actual mathematical language. These projects have operated in isolation and seem to be mostly inactive now. The paper [7] by Muhammad Humayoun and Christophe Raffalli, e.g., describes the MathNat project and also surveys other related attempts.

The Naproche approach can be viewed in the Mizar tradition: use a rich controlled language for mathematics, increase the proving capabilities by strong automated theorem proving, and, eventually, create an extensive library of basic mathematics and specialized theories, which simultaneously can be used as a library for human readers.

The readability and naturalness of texts which proof-check in the system motivate significant further extensions of the project where ad hoc methods are to be replaced by principled and established approaches:

1. the input language ForTheL has to be extended for wide mathematical coverage; ForTheL needs an extensive formal grammar and vocabulary to be processed by strong linguistic methods; the vocabulary may also encompass standard symbols and semantic information;

2. methods of type derivation and elaboration should be provided;

3. Isabelle/Sledgehammer-like methods should lead to efficient premise selection in large texts and theories;

4. the creation of libraries of ForTheL documents requires import and export mechanisms corresponding to quoting and referencing in the mathematical literature;

5. the natural text processing of should be interfaced with other proof assistants to leverage their strengths and libraries. We shall in particular work on a “ mode” for Isabelle.