1 Introduction

Writing a mathematical proof could be compared to making a beautiful carpet featuring elaborate patterns: the end result looks impressive, mind-boggling even, but the actual process in fact consists of simple arguments, like small stitches. While formalisingFootnote 1 mathematics with a proof assistant (also referred to as interactive theorem prover), that is, writing proofs in a formal language so that a computer can verify the correctness of a proof, every simple argument should be made explicit: it can be a painstaking process, as every step that is trivial to a human may need to be analysed in a number of elementary sub-steps. But this process could be invaluable to students (and sometimes even to professional mathematicians) who can gain a deeper understanding of a proof. This is of course far from the only motivation to formalise mathematics with a proof assistant: verification of proofs, as well as contributing to the constantly growing libraries of formal proofs that can be used for creating tools to assist mathematicians in their creative work, are also very important motives (a discussion on this will follow in Sect. 2).

It should be clarified that proof assistants are not the same as computer algebra tools, such as MATLAB or Mathematica: the latter mainly do computations, while the former mainly do reasoning. Proof assistants can be used to verify various algorithms that may have many different applications in theoretical computer science and in industry too, but here we discuss the use of proof assistants for formalising and verifying mathematical proofs in particular. Various different proof assistants that are based on different formal systems have been developed over the years. A comprehensive review comparing different theorem provers (as of 2006) edited by Freek Wiedijk and forwarded by Dana S. Scott is a useful reference for interested users [65]; seventeen systemsFootnote 2 are showcased by presentation and direct comparison of formalised proofs. Since then, the state-of-the-art has improved with libraries of formal proofs having expanded significantly and with the new proof assistant Lean having entered the picture as well.

In an online list [66], Freek Wiedijk is keeping track of which theorems, out of a list of 100 significant mathematical theorems, have been formalised in which proof assistant, considering tenFootnote 3 different systems. As of today, 95 out of the 100 theorems of the list have been formalised in at least one of these ten systems, and Isabelle comes second after HOL Light in the list with 83 out of the 100 theorems having been formalised in Isabelle.

Nowadays, the most popular proof assistants for formalising mathematics are Isabelle, Coq, Lean, Mizar, HOL4, HOL Light, Agda and Metamath, but this article, as the title suggests, focusses on Isabelle/HOL (: Higher Order Logic) in particular. Isabelle was first developed by Lawrence Paulson and Tobias Nipkow [46, 51] and supports different logical formalisms, such as first and higher-order logic as well as Zermelo Fraenkel set theory (Isabelle/ZF). Isabelle/ZF and Isabelle/HOL share the same basic inference procedures and user interface, however Isabelle/HOL is more developed and, as we will see, features useful automation tools.

Proof assistants are becoming more and more popular among younger mathematicians and students. For example, note the prize-winning project by a group of undergraduate students from Bremen formalising Matiyasevich’s proof of Hilbert’s 10th problem with Isabelle/HOL [54, 58], as well as the vibrant community of young mathematicians and undergraduate students working under the direction of Kevin Buzzard at Imperial College London with Lean [7]. This enthusiasm has in part been fuelled by several important milestones, such as the verification of the proof of the four-colour theorem in Coq by Georges Gonthier [12], the verification of the proof of the Kepler conjecture [16] in HOL Light and Isabelle by Thomas Hales et al. [17] as well as the formalisation of Gödel’s incompleteness theorems using Isabelle/HOL by Lawrence Paulson [48, 49].

Hoping that this may be of help to students interested in formalisation of mathematics and automated reasoning, I am sharing this reportFootnote 4 on my comments (mostly focussing on the difficulties) on my experience with Isabelle/HOL working within the ALEXANDRIA project at the University of Cambridge, as a pure mathematician who had no prior formalisation experience. My goal is to give an overview of my experience with Isabelle/HOL so far and of my views on the formalisation of mathematics, and to share some instructions on getting started with Isabelle as well as some first observations which I made, both technical and conceptual, that might prove to be helpful to students and beginners in their early learning stages.

The plan of this paper is as follows: in Sect. 2, I discuss the main motives for formalising mathematics also commenting on my own motivation and research background and I introduce the ALEXANDRIA project at the University of Cambridge. In Sect. 3, I give a brief introduction on how to get started with Isabelle/HOL. In Sect. 4, I discuss the first difficulties I encountered with Isabelle/HOL. In Sect. 5, I give an overview of the first projects I worked on as case studies to explore what Isabelle/HOL can and cannot do. Section 6 is to be read as a caution to new users showing how to avoid making a bad use of a proof assistant, in particular I give examples of different kinds of mathematical mistakes and show how Isabelle would not detect them if the user is not making a responsible use of it. Finally, Sect. 7 involves a comment on the main two different aspects of revolutionising mathematical practice through the mechanisation of mathematics.

2 Motivation for Formalising Mathematics and the ALEXANDRIA Project

2.1 Why Formalise Mathematics?

However one would like to describe mathematical practice, it is certainly not the same as programming (nor should it be reduced to it). As computers and artificial intelligence are becoming more and more integrated into every aspect of our lives, “modernising” mathematical practice in that respect does not sound unexpected, but this is not an issue of modernisation for the sake of it (although for some people this would be a sufficient reason); in fact, formalising mathematics offers many direct benefits. There are different approaches to making a choice about what material to formalise. These are: (a) formalising the mathematical curriculum, that is, basic material that undergraduate students are usually taught; (b) formalising advanced, famous results, e.g. the aforementioned formalisations of Gödel’s incompleteness theorems, the four-colour theorem and Hilbert’s 10th problem; (c) formalising new research results, that can be either from mainstream research papers or that could be considered groundbreaking (like the aforementioned proof of the Kepler conjecture). To the above we can add a fourth strategy, which is not quite achievable with the current state-of-the-art but is one of the main goals for the future of the field: (d) discover new mathematical results through the process of formalising.Footnote 5 To this end, advances in machine learning are widely regarded as necessary. Timothy Gowers has described how an interactive assistant would ideally “converse” with a human mathematician to assist the discovery of new results [14]. But, today, we are still far from this goal.

There are different expected benefits that motivate each strategy.

An obvious first reason is verification. This applies mostly to (c), since in all other cases the material has been checked by a great number of people over the years and we wouldn’t expect to find mistakes in elementary material. With more advanced and more recent results the probability of finding mistakes of course starts to increase. A recent example in the theory of Gromov-hyperbolic spaces is a mistake found through formalisation in Isabelle by Gouëzel and Shchur [13].

Another reason, applying mostly to (a) and (b), is that contributing to the (very fast growing) libraries of formal proofs amounts to the creation of a database with a huge potential. A “physical” mathematical library consisting of “material” books, or even an online library consisting of pdf files, is more restrictive – while here we have to do with a library written in code: something we can modify, interact with, reuse. More importantly, formalised material can be used to create tools for goal (d). A vision for the future is the creation of an interactive assistant that would provide “brainstorming” tips to research mathematicians in real time assisting them in the process of discovering (or inventing) a new result.

A third reason, applying to all (a), (b), (c) (and (d)), is that the process of formalising in itself can help the user gain brand new insights even in already familiar topics. To a large extent this is because of the high level of detail in which a formalised proof must be written, but also because using new tools forces to look at familiar material from a new angle.Footnote 6

Last but not least, formalisation can also serve educational purposes (this applies mostly to (a)).

I will not elaborate further on a general discussion about this topic – I rather restrict to mentioning the inspiring papers by Avigad [3], Paulson [50], Lamport [42], Barendregt and Wiedijk [4] and the classical QED Manifesto from the mid-90’s [6]. Pioneering mathematicians such as the Fields medalists Timothy Gowers [14] and Vladimir Voevodsky [61] and more recently Kevin Buzzard [7,8,9, 11], Peter Koepke [24] and Benedikt Löwe [43], among others, have been very strong advocates for the use of proof assistants and formal verification in mathematics.

2.2 More Personal Motivation and My Mathematics Background

As a pure mathematician with some background in logic and proof theory, my own interest in the topic was initially driven not only by a fascination for the emerging culture of re-imagining mathematical practice in the light of new AI developments but also by philosophical questions on the nature of mathematical proofs, e.g. when encountering in my own research work different proofs of similar statements giving completely different computational content [29, 31].

In particular, regarding my mathematics background, my PhD research [32] was pen-and-paper work and involved applications of proof theory to mathematics (mainly in nonlinear analysis) [28,29,30,31]. I have been working within Ulrich Kohlenbach’s proof mining programme [25,26,27] that involves pen-and-paper extraction of constructive/quantitative information from proofs in the form of computable bounds, which requires a logical analysis of a proof and rewriting it to make the logical form of all the statements involved explicit via revealing the hidden quantifiers. It is only natural that such a line of research would provoke the question:

What is it that makes a “good” proof?

which of course has many possible answers:

  • a shorter proof;

  • a more “elegant” proof (which is of course usually subjective);

  • a simpler proof (consider Hilbert’s 24th problem (1900): “find criteria for simplicity of proofs, or, to show that certain proofs are simpler than any others.” [45]);

  • thinking in terms of Reverse Mathematics – a proof in a weaker subsystem of \(Z_{2}\) (Second Order Arithmetic);

  • an interdisciplinary proof (e.g. a geometric proof for an algebraic problem or vice-versa would be considered to give a deeper mathematical insight);

  • a proof that is easier to reuse i.e. if it provides some algorithm or technique or intermediate result that can be useful in different contexts too;

  • a proof giving “better” computational content.

In the aforementioned proof mining papers [29, 31] it turns out that different proofs of similar statements (about the fixed points of nonexpansive semigroups) give completely different computational content, which in turn, provokes the question:

What do we mean by “better” computational content?

  • a bound of lower complexity?

  • a bound that is more precise numerically?

  • a bound that is more “elegant”?

It would be reasonable to ask too:

  • How are the aforementioned proof features related to each other (if at all)?

  • Could we ever ensure that we get the optimal computational content (from a given proof)?

The answers to the above questions are non-trivial – and my hope is that formalisation could provide us with some useful insights.

I joined the ALEXANDRIA project in October 2017, some months after obtaining my PhD in pure mathematics and with no prior formalisation experience in any proof assistant. Although proof mining has trained me in patiently de-constructing proofs into every elementary sub-step and in tidying up proofs after finding their underlying logical structure, which are both necessary skills for anyone interested in formalisation, starting to explore a proof assistant was to me a brand new challenge.

2.3 The ALEXANDRIA Project

The project ALEXANDRIA: Large-Scale Formal Proof for the Working Mathematician led by Lawrence Paulson and funded by the European Research Council, started at the University of Cambridge in Autumn 2017 and is aiming at contributing to the creation of a proof development environment for working mathematicians through a collaboration of mathematicians and computer scientists. The focus of the project is the management and use of large-scale mathematical knowledge, both as theorems and as algorithms. To formalise mathematical proofs we make use of the proof assistant Isabelle/HOL. Our goals involve not only the contribution to the Isabelle libraries with new formalised material, but also the management and organisation of the libraries, as well as the creation of interactive tools that would assist mathematicians in the process of both formalising known proofs and discovering new results. To this end, tools facilitating automation and search are important priorities. The ALEXANDRIA group members are Lawrence Paulson, Wenda Li, Anthony Bordg, Yiannos Stathopoulos and myself. For more information I refer to the project description by Lawrence Paulson [47].

3 First Steps in Isabelle/HOL

Isabelle has several major advantages over other proof assistants: first of all, it uses the structured language Isar, which is understandable by both humans and machines. Moreover, it supports automation tools, which are much more efficient than automation in any other proof assistant. The main automation tool featured by Isabelle/HOL is Sledgehammer [5], which searches for proofs by calling a number of external automated theorem provers, and afterwords converts the proofs (if found) to Isar. Also, there is the command try0, which calls a number of widely used proof methods in search of a proof (simp,  auto,  blast,  metis,  argo,  linarith,  presburger,  algebra, fastforce,  force,  fast,  meson,  satx). There are also certain counterexample finding tools (nitpick and Quickcheck). Sledgehammer, nitpick and Quickcheck are not supported by Isabelle/ZF. We will discuss more Isabelle’s automation later. Another advantage of Isabelle is that its libraries are already quite extensive – a great deal of material has been formalised over the years and the libraries are constantly growing. Isabelle is based on simple types and the internal/implementation languages are Standard ML and Scala. It does admit classical (i.e. non-constructive) proofs. In particular, Isabelle/HOL includes the axiom of choice. The default user interface is Isabelle/jEdit, which provides real-time proof checking, as well as rich semantic information for the formal text and direct links to the user manuals and tutorialsFootnote 7 (see “Documentation” at the left-hand side of the user interface).

The first step is of course to download Isabelle (which is distributed for free under open-source licenses) from the Isabelle website [21]. Installation is straightforward. Next, one should get familiar with the Isabelle Libraries and the Archive of Formal Proofs (AFP) [1]. The HOL and ZF Libraries can be found in [22] and [23] respectively. The AFP contains a vast collection of formalised material in a variety of topics in mathematics (algebra, analysis, probability theory, number theory, geometry, topology, graph theory, combinatorics, category theory) as well as some applications in economics and physics. Moreover, the AFP contains extensive proof libraries in computer science and logic. As of 2 September 2020, there are 553 entries by 362 authors [1]. Isabelle users can get help with their questions on the Isabelle mailing listFootnote 8 as well as on the Isabelle Zulip chat.Footnote 9

Reading the manuals and tutorials can be helpful, but usually learning-by-doing can be more efficient. It would be thus advisable to start by exploring some theories from the library (e.g., after having installed Isabelle, by opening a session and going to

File→ Open→ isabelle→ src→ HOL→ Analysis

to get to the Analysis Library theory files) and/or download one of the AFP entries (there are detailed instructions on the website [1]) and make modifications to see how it responds.

A theory file starts with theory [the name of the file] followed by imports [the libraries or AFP entries that need to be imported for the purposes of the theory at hand]. The theory is then developed between the keywords begin and end. The file should be saved with the same name that is used right after theory, followed by “.thy”.

The user can call the automation tools to look for proofs simply by typing try0 and Sledgehammer. In order to search for already formalised material in the loaded theories of the active session, the user can type find_theorems or find_theorems name: followed by some search word. While looking at an open .thy file, the user can place the cursor on any object and by pressing the Command key and clicking, the jEdit interface takes the user to the theory where the object is defined. The keyword sorry can be used in the place of a missing proof, so that a statement can be temporarily regarded as “proven” and the user may even refer to the result to use it in other proofs and return to the missing proof later. The keyword oops has a similar use, except that it “abandons” a proof typically in the middle; unlike sorry, it cancels the entire proof attempt up to the previous lemma/theorem/proposition/corollary keyword and the claimed statement is not available to be used.

Looking at examples online from the Library [22] and the AFP [1], the user will notice the variation of colours, which helps to distinguish between different kinds of keywords. For example, text with comments or headings, and facts in quotations that had been previously assumed/defined/shown within the same proof appear in orange, words indicating the structure of a theory or a proof, like theorem, lemma, proof, by, qed, subsection appear in navy blue, keywords of an imperative nature within a proof such as fix, assume, thus, show, obtain appear in light blue, keywords indicating the posing of assumptions and conclusions i.e. assumes, fixes, and, where, shows, if, is appear in green, entire mathematical statements appear in pink, the keywords apply and done referring to applying one or more proof methods appear in red, comments within the symbols (* and *) appear in dark red, and everything else appears in black.

Note that while working directly on a formalisation within an active Isabelle session the colours will differ, namely the user will notice that fixed/free variables appear in blue and bound variables appear in green.

3.1 Examples

Most new users would agree that Isar looks understandable, at least not much less than code: a few examples with some clarifications are given below. The reader is invited to locate the following examples, which can be accessed online on the Library or the AFP as given. However, in order to be able to work directly with the theory (and to decipher any references to unknown objects by Command-Click which will take the user to the wanted definition of the object in question), the user must open these theories via an Isabelle session following the previously given instructions.


A definition of the 7-dimensional real cross product from my AFP entry on Octonions [34]: We have defined a multiplication operation between two 7-dimensional real vectors \(\vec{a}\) and \(\vec{b}\) giving a 7-dimensional real vector, so that each of its coordinates is given in terms of the coordinates of \(\vec{a}\) and \(\vec{b}\); for example, the first coordinate will be given by \(a_{2} \cdot b_{4} - a_{4} \cdot b_{2} +a_{3}\cdot b_{7} - a_{7} \cdot b_{3} + a_{5} \cdot b_{6} -a_{6} \cdot b_{5}\).

figure b


A statement attesting that norm equality implies inner product equality (and the converse) together with a proof, from the Analysis/HOL Library [22], in particular from the theory Inner_Product by Brian Huffman: Notice that the proof here is “apply-style” and not a structured Isar proof. The user can see the definitions of each symbol and of norm, inner, as well as the statements of the lemmas order_eq_iff, norm_le that are used in the proof by placing the cursor on the object in question, pressing the Command key and clicking.

figure d


Let us now look at an example in more detail, comparing the “informal” with the formal proof. Consider the following version of the Cauchy-Schwarz inequality together with a proof. An “informal” proof is given below. Note that we use \([\_\_\_]\) to denote the “trivial” intermediate arguments which are missing and will be explicitly shown in the formal proof.

Cauchy-Schwarz inequality: For every two vectors \(\vec{x}\), \(\vec{y}\) of an inner product space over the field of real numbers, \(|\langle \vec{x}, \vec{y} \rangle |^{2} \leq \langle \vec{x}, \vec{x} \rangle \cdot \langle \vec{y}, \vec{y} \rangle \).

Proof: We distinguish two cases: The case that \(\vec{y}=0\) (in which the proof is trivial \([\_\_\_]\)) and the case that \(\vec{y}\neq 0\). Assume that \(\vec{y}\neq 0\). Let \(r := \langle \vec{x}, \vec{y} \rangle / \langle \vec{y}, \vec{y} \rangle \). We have that: \(0 \leq \langle \vec{x}- r \vec{y}, \vec{x}- r \vec{y}\rangle~[\_\_ \_]\ = \langle \vec{x}, \vec{x}\rangle - \langle \vec{y}, \vec{x} \rangle \cdot r~[\_\_\_]\ = \langle \vec{x}, \vec{x}\rangle - |\langle \vec{x}, \vec{y} \rangle |^{2} / \langle \vec{y}, \vec{y} \rangle~[\_\_\_]\). So we have shown that \(0 \leq \langle \vec{x}, \vec{x}\rangle - |\langle \vec{x}, \vec{y} \rangle |^{2} / \langle \vec{y}, \vec{y} \rangle \) from which follows that \(|\langle \vec{x}, \vec{y} \rangle |^{2} / \langle \vec{y}, \vec{y} \rangle \leq \langle \vec{x}, \vec{x} \rangle~[\_\_\_]\) from which the wanted inequality immediately follows \([\_\_\_]\). QED.

The above proof is written in Isabelle (this is from the Analysis/HOL Library [22], in particular from the theory Inner_Product by Brian Huffman) as follows (note that in the place of every \([\_\_\_]\) we have provided proofs which are indicated by the keyword by in navy blue followed by some proof method, or by some proof method and one or more lemmas):

figure f

We continue with a few more examples of well-known statements formulated in Isabelle.


The statement of the Stone-Weierstrass theorem for polynomial functions, from the Analysis/HOL Library [22], in particular from the theory Weierstrass_Theorems by Lawrence Paulson:

figure h


A version of Zorn’s lemma, together with a proof, from the HOL Library [22], in particular from the theory Zorn by Jacques Fleuriot, Tobias Nipkow and Christian Sternagel, ported from Lawrence Paulson’s Zorn theory development from Isabelle/ZF:

figure j


The statement and a proof of the Riemann mapping theorem, from the Complex_Analysis/HOL Library [22], in particular from the theory Riemann_Mapping by Lawrence Paulson (note that the formal proof here is very short as it directly follows from auxiliary statements formalised in the same theory):

figure l

4 First Difficulties Encountered

4.1 Syntax

As in any new programming (or even natural) language, the first challenge for a new user is familiarisation with the syntax and the essential keywords. In fact, I found Isar to be both quite intuitive in terms of structure and easily readable, while the jEdit user interface is very user-friendly. The fact that Isar admits structured proofs is a major advantage. There are, however, certain (mostly syntactic) features that may seem surprising to a new user. Some miscellaneous characteristic examples are:

  • The standard proof patterns

    have "a<b" also have "...<c" finally show "a<c" by auto and

    have "a<b" moreover have "...<c" ultimately show "a<c" by


  • The syntax does not allow for shortcuts such as “\(a < b < c\)” or “\(a, b, c >0\)”, instead one has to write "a<bb<c" or "a<b" and "b<c", respectively

    "a>0b>0c>0" or "a>0" and "b>0" and "c>0".

  • The user has to remember to always include type information. Even arabic numbers are in certain cases regarded as constants of some unknown type unless their type is explicitly stated (e.g.(1::int)/(2::int)^0 = 1 is proven just by the proof method simp while for 1/2^0 = 1 automation gives no answer nor do we get any explanatory error messages). Moreover, for exponentiation, if the exponent is of type real or integer one has to use powr, while for type natural the symbol ^ works. Also, often the user has to switch the type of a variable or constant (of_int, of_real, of_nat) when confronted for instance with division.

  • The meaning of some keywords, e.g.: where, that, when, at_top,


  • The use of several symbols, e.g.: the meet and join operators for lattices are symbolised by ⊓, ⊔ instead of ∧, ∨ that are normally used in the literature; different kinds of arrows; the absolute value symbol.

  • Overall the extremely high level of detail in which proofs must be written.

Discovering such features for the first time may be slightly time-consuming for a new user as error messages are not always helpful. New users should not be discouraged by the potentially slow progress during the very first stages of their learning as after some practice the learning process accelerates significantly and becomes very rewarding.

4.2 Search Features

After syntax, search is of very high significance for the mathematician user, both in terms of finding material by thematic classification, and of finding essential technical lemmas and definitions while in the process of formalising new work. In the latter case, it is extremely helpful that the interface allows for words to function as hyperlinks: as mentioned, by pressing the Command key while clicking on each object, the user is taken to the theory where the object at hand is defined. However, there is room for improvement with respect to search features, from different points of view. In particular:

(1) It is possible to search for theorems only on the basis of symbolic patterns occurring in them or of their names (via find_theorems or find_theorems name: as mentioned) so an unexpected name could create obstacles. Even though find_theorems is helpful most of the time, in some cases it could be limiting. In particular:

  • As search is done based on pattern-matching, especially new users may sometimes not know what are the appropriate search words for the notions that they are looking for (e.g. “summable” yields many results while “summability” yields no results, “infimum” and “supremum” yield no results while a lot of related material exists in the libraries etc). That is, searching for associated concepts would be more helpful.

  •  find_theorems is case-sensitive (e.g. “borel” gives 510 results while “Borel” gives no results). We are aiming for case-insensitive search.

  • Search is performed only in the libraries and theories that have been already loaded by the user. Ideally, search would be performed in all the libraries (and the AFP) regardless of what the user has loaded.

  • It would be useful to differentiate the search for facts based on mathematical objects related to their statements from the search for facts based on mathematical objects related to their proofs and to patterns that may occur in their proofs.

  • An efficient method of filtering and ranking the search results would also be useful to have.

(2) At the same time, manual search in the Library can be rather time-consuming, which is a consequence of the four following facts: (a) the fast growing size of the Library, especially the Analysis Library. Tobias Nipkow has recently initiated an effort to create a comprehensive manual for the contents of the Analysis Library. Fabian Immler, Lawrence Paulson, Manuel Eberl, Wenda Li, Mohammad Abdulaziz and myself have been contributing to this work in progress. (b) The general difficulty in classifying mathematical knowledge, as very often the borders between mathematical disciplines are unclear. Users may have difficulty finding material that is classifiable in different ways. (c) It is not unusual in mathematical literature to use different names in different contexts for the same notion (synonymy). For instance, I had been looking for a formalisation of the valuation operator \(v_{2}\) i.e. \(v_{2} (n) = \max A\) for \(n \in \mathbb{Z}\) where \(A = \{k \in \mathbb{N}: 2^{k}~\text{divides}~n \}\). I was informed by Manuel Eberl that he had formalised this under the name “multiplicity” (and in fact for \(p\)-order, not just 2-order), in the Computational Algebra Library, in Factorial.Ring.thy [22]. (d) And vice-versa: using the same name while referring to different notions (polysemy). This unfortunately happens too often: see for example the response received on MathOverflow when I asked the community about instances of this phenomenon in their field of mathematical research [44]. This is obviously one of the sources of errors for the mathematical literature that we hope to eliminate thanks to formalisation, nevertheless it can cause problems for the actual formalisation process as well because of the challenges it creates for the organisation of the Library.

(3) Efficient search for proof patterns and algorithms is another big challenge to be tackled. To this end, employing tools from machine learning is considered very promising.

Within our project, Yiannos Stathopoulos and I are currently working on SErAPIS (: Search Engine by the ALEXANDRIA Project for Isabelle), a concept-oriented search engine [59, 60], by using techniques from natural language processing. A different search engine for Isabelle is independently being developed by Fabian Huch and Alexander Krauss in Munich [20].

4.3 Automation

The high level of detail that is required when formalising a mathematical proof can render the process time-consuming, thus efficient automation is vital. This is one important factor that makes Isabelle/HOL more user-friendly than other proof assistants. In general, Sledgehammer and try0 work remarkably well for very simple statements, while the possibility of counterexample finding with Quickcheck and nitpick can be very helpful. However, in many occasions certain elementary examples (e.g. splitting up summations) cannot be solved by automation as one would initially expect. Another observation is that for algebraic expressions where minus is involved, it is often much easier to find proofs by automation when we are working with type integer than when we are working with type natural, due to the respective definitions not having the same algebraic properties. Moreover, in certain cases where long and complicated expressions are involved and in examples involving distributivity, simplification, handling inequalities involving division/multiplicative inverses, comparing inequalities after multiplying with a complicated expression, comparing complicated expressions involving logarithms or exponents, Sledgehammer times out (while try0 also fails), even though in examples of the same style but involving simpler expressions they are successful in finding proofs. It is worth noting that automation power is constantly getting improved, so in a couple of years, if not sooner, more and more complicated examples will most likely be solvable by automation.

As a sidenote regarding automation, a common source of errors for inexperienced users is the use of the keyword sorry which temporarily regards an unproven statement as proven, so that the user can go on with the formalisation and even use the statement in other proofs, and return later to fill in the gap. This may lead automation to suggest a proof of a statement B using a wrong statement A that had been “shown” by sorry prior to statement B. This is a consequence of the principle of explosion in logic (ex falso sequitur quodlibet–from falsehood anything follows). Even though sorry is extremely helpful, new users should be aware of the above issue.

4.4 Using the Formalised Material

Finally, other issues of secondary importance that the user should be aware of relate to the rigidity of the structure of the formalised fundamental material in the Library. For instance, the definition of an “algebra” in the Library includes associativity, so working with nonassociative algebras is not possible within an existing type class, e.g. see my development of Octonions [34] following Lawrence Paulson’s development of Quaternions [52]. Another example is an observation by Anthony Bordg that a ring-scheme is required in order to define an abelian monoid.Footnote 10

It should be noted that in my experience so far I did not encounter a problem with the fact that Isabelle is based on simple types instead of dependent types. Lawrence Paulson discusses this issue [53].

5 Experimenting and Exploring

For the first project I suggested to formalise a proof that is (1) research-level, (2) using elementary material that is already in the Isabelle Library, (3) not very long or too complicated. I thus opted for an irrationality criterion for infinite series by Hančl [18] that fulfills the above prerequisites and Wenda Li and I formalised it [38]. It turned out that the Library did not contain a development for infinite products that were used in the proof (at that time we were using Isabelle 2017), so we used infinite sums via the logarithm instead. With this incentive, Lawrence Paulson soon afterwards wrote a development for infinite products (extending earlier work by Manuel Eberl) which was included in Isabelle 2018.

After a discussion with Anthony Bordg I realised that it can also be meaningful to formalise a theorem even if all the proofs of the statements its proof relies on are yet not fully formalised. This would be achieved by using the theorems on which our theorem at hand depends as assumptions (implemented as a locale, or within a context). This is a compromise on the vision of “absolute correctness” but it is a more realistic approach that reflects actual mathematical practice, and at least provides verification on a level of “relative” correctness. An obvious benefit of this approach would be the possibility to reach more advanced mathematics more quickly. I am still too reluctant to recommend this, as a generalised use of it could help spread errors in the literature undermining the whole point of verification and even do more harm than good if the wrong results have the “Isabelle/HOL-verified” seal. But provided that (1) the theorem used as an assumption is safe as a central/mainstream result that has been widely and carefully checked and (2) assumptions and methods are made fully transparent, it can be a worthwhile experiment. I thus decided to formalise some result in research-level mathematics following from some well-known, fundamental theorem the proof of which has not been formalised, so Wenda Li and I formalised two transcendence criteria for infinite series by Hančl and Rucki [19] that make use of Roth’s celebrated theorem on rational approximations [55]. In our formalisation [39], Roth’s theorem was given only as an assumption. The process of formalisation revealed a slight mistake in the original proof [19] which, as it turned out after corresponding with the authors, was very easily fixable. Wenda Li and I have also recently formalised certain irrationality criteria for infinite series by Erdős and Straus from 1974 [10]. As noted in our formalisation work [40], certain inequalities from one of the proofs of the original paper required a few minor corrections (this did not affect the correctness of the statements nor the general proof structure).

While formalising Aristotle’s assertoric syllogistic in Isabelle/HOL [33] (following [57]), I noticed that Sledgehammer was very efficient: it is interesting that it could find straightforward one-line proofs for statements that ancient philosophers showed with much lengthier arguments. In mathematics it is, of course, usually the opposite, formal proofs are longer than the original; the de Bruijn factor is used to measure how much longer a formal proof is, and it is defined as the ratio of the number of lines of the formal proof to the number of lines of the corresponding informal proof in TeX encoding [63, 64].

I have also recently formalised some material on amicable numbers [35], which revealed the efficiency of certain methods with respect to numerical computations. In particular, the method simp immediately shows primality for small prime numbers (for larger primes, to show primality I make use of Pratt Certificate [67] which is also straightforward to apply) and the method eval helps to show what is the set of all divisors of a number.

Another direction that currently interests me is a possible use of formalisation for the benefit of automating pen-and-paper proof mining, which I conjecture could be achieved once machine learning comes into play. I suggest, in particular, that while building extensive libraries of formal mathematical proofs, it would be meaningful to opt for formalising proofs whose computational content is made explicit in the meantime, so that as automation improves and blocks of (sub)proofs get generated automatically, the preserved computational content would get recycled, recombined and would eventually manifest itself in different contexts. To this end, I do not suggest restricting to only constructive proofs, but I suggest that proof mined (i.e. possibly also non-constructive proofs but with some explicit computational content) should be preferable, if possible [36, 37].

6 Disclaimers and Cautions

Using a proof assistant to verify mathematics is somewhat reminiscent of writing down a relative consistency proof as logicians do. (Of course, this is not a problem for formalists.Footnote 11 Timothy Gowers claims that “modern mathematicians are formalists, even if they profess otherwise” and that “it is good that they are” [15]. Kevin Buzzard, as many mathematicians interested in automated reasoning, also describes himself as a formalist).

This reminiscence can be seen on two different levels: (1) From the point of view of the correctness of the core of the system. This can be trusted considering the underlying architecture of the proof system and the small size of the core [51, 53] and as (on a more practical level) the user interface provides a window showing the proof state at every step so the user can be in control of the proof. (2) From the point of view of the correctness of the mathematical assumptions, which is the issue I discuss below. Using a proof assistant does not guarantee “absolute” correctness, as the work might always be prone to human errors. An obvious way of making a bad use of a proof assistant thus leading to human errors going undetected is proving something different than what was initially intended or claimed (while the intended statement might even be false) because of either (a) using a misleading name of the proved statement/ misleading definitions [44] or (b) something as simple as a typographical error in an intermediate step of the proof like a misplaced parenthesis e.g. showing \(f(n+1)\) instead of \(f(n) +1\). Also mind the principle of explosion in logic (from falsehood anything follows)!

6.1 Different Kinds of “Wrong” in Mathematics and Isabelle/HOL

This part is to be read as a caution to new proof assistant users; as already explained, using a proof assistant is not a panacea guaranteeing correctness, as the work might always be prone to human errors which, as we illustrate in the examples below, may not be detected by the proof assistant. Mathematicians may make mistakes that fall into different categories. Several (extremely naive) examples for each category of mistakes and “mistakes” that may be committed by human mathematicians are given below, and it is shown how Isabelle may not necessarily detect them, as they are essentially “semantic” mistakes that are syntactically correct. Of course, a mathematician would not make the specific elementary mistakes mentioned below, but mistakes of a similar nature in a much more sophisticated context could possibly be made, and as is the case with the naive mistakes below, they may be similarly overlooked.

6.1.1 Proving a Conclusion That Is Too General

figure n

This is logically correct, but mathematically undesirable. This is accepted by Isabelle as an instance of the logical axiom \(A \rightarrow A \lor B\).

6.1.2 Using a Superfluous Assumption

figure p

This is again logically correct, but mathematically undesirable and accepted by Isabelle as an instance of \(A \land B \rightarrow B\) from logic. (As a sidenote, the proof method metis in particular does warn the user about unused theorems in the proof. Moreover, users can check themselves for unused assumptions with find_unused_assms, but this is not always efficient).

6.1.3 A Logical Inconsistency in the Assumptions

We have seen that superfluous assumptions are permitted–moreover it can be that some might even contradict each other:

figure r

An instance of the above in mathematics, for example would be:

figure t

6.1.4 Assuming One (or More) Wrong Fact(s)/ Assumption(s) That Cannot Be Fulfilled

figure v

The first examples are instances of assuming a wrong mathematical fact while the last one is an instance of the assumptions being incompatible as in the above subsection. For example, such mistakes (unless of course we are using assumptions that we conjecture to be false so that they yield a proof by contradiction) could occur when an author makes bad use of the literature by using two or more nonequivalent definitions of a mathematical object within the same context, which may easily happen [44].

6.1.5 Lack of Precision when Approximating

This instance of course falls into the aforementioned case of assuming wrong facts, but this demonstrates how imprecise approximations (which a user may be often tempted to make, especially when dealing with numerical computations) in particular can be accepted by Isabelle without any error messages, potentially leading to further wrong numerical results.

figure x

6.1.6 Requirement of an Additional Assumption

Mathematical practice involves trial-and-error (Lakatos [41]). A human mathematician would try to figure out if they could prove their conjecture to be true after adding or removing assumptions. Although often Isabelle helps towards this direction by finding counterexamples thanks to nitpick or Auto Quickcheck, usually the user would not directly receive a suggestion by Isabelle to modify the assumptions. For example, in the following it would be useful if Isabelle would explicitly suggest to include the assumption \(d \neq 0\) (if this is included, a proof by simp is immediately found):

figure z

6.2 Discussion

Examples 6.1.1-6.1.4 should not be considered surprising: we can view them as a manifestation of the fact that Isabelle relies on logical inference [51] while mathematics, even though it is “a method of logic” as Wittgenstein wrote [68], is, moreover, richer in content and meaning. From a logical point of view \(A \land B \rightarrow B\) is fundamental, but from a mathematical point of view assuming \(A \land B \) (that could internally amount to \(B\) if assumption \(B\) would be sufficient to prove some specific end goal) would appear as making use of a superfluous assumption which is undesirable; from a logical point of view the principle of explosion (from falsehood anything follows) is fundamental, but in mathematics proving a false conclusion from assumptions that we already know are false would be a senseless pastime (except, of course, in the case of assumptions that are only conjectured to be false when writing a proof by contradiction); from a logical point of view \(A \rightarrow A \lor B\) is a fundamental axiom, but from a mathematical point of view it would not be an interesting deduction to make.

Examples 6.1.5 and 6.1.6 are not surprising either, as for Isabelle to suggest a modification of the assumptions as a human mathematician would do would require Isabelle to feature a kind of “Intelligence” that no proof assistant nowadays features (not yet?). Russell and Norvig [56] (Chap. 1.1) distinguish four different categories of definitions of artificial intelligence, none of which is safe to say that is currently fully satisfied by Isabelle, nor by any other proof assistant to my knowledge.

The conclusion to draw from the above examples is that, after all, the user is the one in control and thus should be careful to use Isabelle (or any other proof assistant) responsibly. It should be stressed that the high value of a proof assistant lies in the fact that it prevents another very common and dangerous source of fallacy in mathematical practice: being misled by one’s intuition thus allowing for gaps in a proof or for overlooking mistakes. A proof assistant like Isabelle would not allow for this to happen, as every argument is required to be made explicit in all detail. In my own experience working in proof mining, which is pen-and-paper work revealing the hidden logical structure of proofs, it sometimes turns out that the original proof can be written in a more detailed and structured way. However, working in automated reasoning reveals the requirement of an even higher level of rigour. Another benefit follows from the flexibility and interactive nature of working with code in general: one can more easily uncover hidden dependencies, e.g. one can see how and where a proof breaks by making changes such as deleting assumptions.

7 Towards the Future

Is the way that we are doing mathematics being revolutionised? Most likely, yes.

This is reflected in an important recent milestone; for the first time, a class titled “Computer science support for mathematical research and practice” is included in the new Mathematics Subject Classification announced in early 2020 (MSC 2020, Class 68Vxx). This class includes topics such as computer assisted proofs, proofs employing automated/ interactive theorem provers, formalisation of mathematics in connection with theorem provers. Thus, the aforementioned topics are now officially recognised as areas of mathematical research (in addition to computer science research as it was until now).

As an anecdote indicative of the current climate I mention that during the panel discussion of the workshop Foundations in Mathematics: Modern Views in April 2018 in MunichFootnote 12 that attracted young (mostly student-level) mathematicians, philosophers and logicians, interestingly the dominant view discussed arguing for the importance of exploring the foundations of mathematics was their significance for computerised mathematical proofs which among the participants of the discussion was regarded as an inevitable development.

This revolution in mathematical practice can be regarded as two-fold: The first aspect is associated with our expectation of a certain level of correctness and thus refers more to the benefits of formalisation for the purpose of verification. Barendregt and Wiedijk [4] beautifully summarise this vision in the citation given in the beginning of this article.

The second aspect of the revolution is associated with the tools used in our day-to-day work. A groundbreaking tool would be an interactive assistant that would provide working mathematicians with brainstorming in the form of tips on how to prove a statement, or even suggestions for new conjectures. This would be achieved by implementing machine learning tools in an appropriate way, and after a substantial amount of mathematics has been formalised. Today we are far from this goal, but even before we approach it, just typing mathematics in Isar (or any other machine language) instead of using paper and pen is already a big shift in our way of working, which would inevitably influence our way of thinking. We are one of the first generations that now type much more than using a pen in our daily lives, and it is conceivable that our strong impulse to write with a pen or a piece of chalk while shaping a brand new thought or trying to understand an idea may disappear in future generations. This is of course an issue related to psychology and cognitive science rather than purely to mathematical practice. As a striking example it should at least be noted that most mathematicians nowadays have no trouble understanding code, while a few years back it was considered rather unusual, and most people would agree that Isar looks not more unnatural than .Footnote 13

Aristotle in his Nicomachean Ethics wrote [2] (Book VII, Chap. 8, 17-19):

“In mathematics, it is not reason that teaches first principles, nor is it in actions; rather, it is virtue, either natural or habituated, that enables us to think correctly about the first principle.”

Could it be that, one day, current mathematical practice might seem to the future generations as antiquated (or as peculiar), as the above statement may sound to us nowadays?