1 Introduction

What does it mean to say that a mathematician has understood a certain mathematical proof? In this paper we approach the topic of mathematical understanding by combining analytical tools from various academic disciplines, most notably the frame concept.

The frame concept (further explained in Sect. 2) is used in linguistics, cognitive science and artificial intelligence to model how explicitly given information is combined with expectations deriving from background knowledge. We apply this concept to the context of mathematical proofs, which readers interpret by complementing the explicitly given information with their expectations about how proofs are usually structured depending on the applied proof method and how mathematical structures and objects are usually referred to in the text. These expectations can be modeled through different kinds of frames, namely structural frames that specify expectations about how proofs that use various proof methods are usually structured, and ontological frames that specify mathematical structures and objects and expectations about how they are usually referred to.

Note that among the goals of our mathematical education are learning about various kinds of mathematical structures as well as learning various proof techniques in the sense of being able to adapt them to other contexts or problems. In terms of the frame model, these goals are achieved when certain frames required for the understanding of proof texts – as well as the ability to apply these frames in practice – have been acquired.

As an illustration of how the frame concept can be fruitfully applied to shed light on the nature of proof understanding, consider the case when a mathematics student has laboriously checked all details in a complex proof but does not see the big picture of how all these proof steps work together. In such a case we would normally not ascribe to that student understanding of that proof. In terms of frames this can be explained by saying that the student has not recognized which high-level structural frames were involved in the proof.

To discuss why the frame concept is helpful for the discussion of mathematical knowledge, we must first discuss the concept of understanding (see Sect. 3). We build upon a concept of understanding in the hermeneutic tradition of philosophy. Another analytical tool that we make use of is a multi-layer analysis of mathematical texts based on Schmid’s (2008, 2010) ideal genetic model of narrative constitution, which distinguishes four levels of narrative texts: the happenings, the story, the narrative and the linguistic presentation of the narrative (see Sect. 4). We propose that such a distinction is useful for analysing mathematical proofs, as well. For example, when mathematicians informally say that a text presents the same proof as another text, only that it arranges the proof steps in a somewhat different way, this corresponds to having the same story arranged in two different narratives.

Through a case study on extremal proofs (Sect. 5) we illustrate that these analytical tools can fruitfully shed light on the nature of proof understanding. In Sect. 6 we show how our findings from the case study can be generalized to further aspects of mathematical understanding. Finally, we compare our analysis of proof understanding to Avigad’s (2008) analysis of proof understanding in terms of the abilities that we require of someone to whom we ascribe proof understanding (Sect. 7).

2 The Concept of Frames

The frame concept originates from linguistics, cognitive science and artificial intelligence. It models how explicitly given information is combined with expectations deriving from background knowledge. In the context of mathematical proofs, readers of proofs have expectations due to their mathematical training. These allow them to interpret a mathematical text and to complement it with additional relevant information.

The concept of frame in artificial intelligence is generally traced back to a report by Minsky (1974) which introduces frames as a general “data-structure for representing a stereotyped situation, like being in a certain kind of living room, or going to a child’s birthday party” (p. 1), organised in a hierarchy (called frame-system by Minsky). “Situation” should not be understood too narrowly, as frames can be used to model concepts in the widest sense, and Minsky illustrates the use of frames with a multitude of examples from vision to story understanding.Footnote 1 For each frame, features are defined, which can be (sub-)frames again. Features, also called slots, can have default values and can carry constraints; slots are filled by concrete values. In formal systems such as the Frame Representation Language (FRL, cf. Roberts and Goldstein 1977), among others, a procedure can be attached to a slot that will calculate the value of a feature from other values when needed. In the world of mathematics, for instance when talking about a circle in Euclidean geometry, we need the centre and the radius to describe the circle; we can calculate its diameter, if needed. Other aspects of formalization developed in later approaches and the related concept of feature structures (see e.g. Carpenter 1992) include the use of types organized in an inheritance hierarchy to constrain both the proliferation of features and their values.

In linguistics, FrameNetFootnote 2 is an important approach using frames. It develops a hierarchical model of (mainly) verb semantics whose main constituents are the participants and their semantic roles. Verbs evoke frames which then provide certain roles that can be taken on by entities in the discourse universe. Roles are generally divided into core and non-core roles, which captures the salience and optionality of the roles within a frame. For example, the FrameNet frame Commerce_buy has the core roles Buyer and Goods, while Seller, Money and Means (e.g. cash vs. check) and many others are non-core roles. The differences in role assignment capture semantic (and pragmatic) differences between verbs. Consider a simple example. Explicitly realised frame elements are annotated in the sentence:

figure a

The frames representing buying and selling can be represented in a feature-value matrix as in (2), which also illustrates subframes (in the Time field). The exclamation mark indicates core roles, and [[expression]] is the semantics of the expression. We use point-in-time, person, money, purpose as labels (for types) that constrain the potential fillers. To spare the reader the enumeration of slots that have not been realised explicitly, we abbreviate the frame, indicating this with the elipsis dots. From now on, we will always assume that a description in this way is generally partial and ellipsis dots are not needed.

figure b

While both sentences evoke the similar frames, the Seller slot need not be filled in the first case. However, it is present and could be filled, just as the Time slot specifying when the transfer occurs is present (but usually non-core) with most verbs, and can be filled with more or less specific values.Footnote 3

More recently, the concept of frame has been developed further in linguistic and philosophic projects, most notably the SFB 991: Die Struktur von Repräsentationen in Sprache, Kognition und Wissenschaft in Düsseldorf (see e.g. Gamerschlag et al. 2014, 2015). The research elaborated the connection of the concept of frames to the semantic category of functional concepts, and also the history of scientific language, while also connecting to discourse analysis (see e.g. Ziem 2008, 2014). Regarding the formal representation of frames, Petersen (2015) develops a model using feature structures closely related to Carpenter’s (1992) and highlighting the connection between frames and functional concepts (see Löbner 2015).

When applying the frame concept to mathematical proofs, we hypothesize that (at least) two kind of frames play a crucial role. First, structural frames capture expectations about how different kinds of proofs and definitions are structured, consider e.g. the proof techniques in Engel’s (1999) Problem-Solving Strategies. Secondly, mathematicians have ontological background knowledge about their domain: mathematical structures and expectations about how these structures and their elements are usually referred to. A formal implementation of this would be MMT theories (Rabe 2016). We call frames representing such background knowledge ontological frames.

In a paper on modelling inductive proofs, Fisseni et al. (2019) started exploring the application of the concept of frame to mathematical proofs. Frames are seen as guiding the processing of proofs based on usual mathematical practice. Which specific elements can be expected (and which of them must be explicitly realized) in a proof may differ according to the mathematical subfield and the type of text, e.g. textbooks vs. articles in a mathematical journal. In any case, knowledge about them is necessary to understand proofs (and also to make proofs readable and avoid excessive redundancy). Frames are seen as a formalization of these expectations, and it is hypothesized that they can serve as bridges between the text and more formal representations.

An important aspect of frame systems is the interaction of frames, i.e. the filling of slots of other frames from neighbouring, sub- or superordinate frames. An example of the interaction of different frames in mathematical proofs is the dependence of the form of an induction on the underlying inductive type. We assume the general structure as in Fig. 1, which gives the traditionally known parts of the induction proof such as the Base-Case, the Induction-Step and the Induction-Variable, but also an Induction-Signature and Induction-Domain, which relate the frame to the frame induced by the type of the induction variable, a relation that is traditionally captured by speaking of induction on natural numbers or similar phrases. Without going into much detail, one can see that the Induction-Domain provides the cases in the Base-Case of the induction through its non-recursive Base-Constructors and that the Induction-Step is determined by the Recursive-Constructors used to derive the ‘successors’ of the base values.Footnote 4

The determination of the Induction-Step is twofold: First on the formal side, the number of cases in the Base-Case and the Induction-Step depends on the number of respective constructors: For a typical induction on natural numbers, one will have one Base-Case and one Induction-Step, whereas for an induction over the complexity of formulas, one will have several Case-Proofs in the Proof of the Base-Case, one for each kind of atomic formulas, and several Case-Proofs in the Proof of the Induction-Step, one for each connective of the formal language. Secondly on the more content-oriented side, the Base-Constructors and Recursive-Constructors provide constraints on the form of hypotheses of the Base-Case and Induction-Step, namely that the hypothesis in the Base-Case is by default about the value designated by one of the Base-Constructors and the hypothesis in the Induction-Step is about a value formed with the Recursive-Constructors.

Similarly and more interestingly, proof techniques in specific areas as presented in handbook articles can constitute building blocks for frames, which are then combined in conventional but also innovative ways.

Fig. 1
figure 1

Induction frame. \((?!)\)  marks default fillers of a slot

3 The Concept of Understanding

Understanding is a vague and multi-faceted concept. Concepts of understanding have in common that understanding means integration into a broader context, which may be a historical context (Schleiermacher) or the ,,Zusammenwirken aller Gemütskräfte in der Auffassung”Footnote 5 (Dilthey 1974, p. 172) in Dilthey’s psychological concept. Philosophers in the tradition of Dilthey, Heidegger and Gadamer use the concept of understanding in contrast to explaining, which latter is seen as the goal of science and which lacks the subject-relatedness of understanding (cf. e.g. Mantzavinos 2020). Dilthey draws the distinction between the natural sciences and humanities along this dichotomy. Therefore, the transfer of the concept of understanding from the hermeneutic tradition to mathematics is not straightforward. In this article we will refrain from any deeper discussion on the philosophiocal implications of the various readings of understanding. We will abtract from the historical or cultural dimensions of mathematical texts which would be contained in a full-fledged concept of understanding, thus following the usual view on the texts in mathematical practice. We, however, think of the concept of understanding as coined by the aforementioned tradition as the most suited concept to refer to the embedding of new information into established knowledge structures, although we limit our considerations to the integration of information from a proof text into existing mathematical knowledge and experience.

Leaving aside the vagueness and the different readings of understanding for the moment, the phrase understanding a proof still has several readings depending on different ways of using the term proof. Proof can mean an abstract formal object (O) which is indicated by a proof text (cf. the derivation-indicator view of proofs, see Azzouni 2004, 2009), but contrary to the text itself consists of a gapless logical deduction of the theorem to be proven from axioms, definitions and other previously proven theorems, i.e. understanding a proof in this sense is

  1. O:

    constructing the gapless formal representation.

But, on the other hand, by proof, we often mean just the concrete text outlining a proof structure or idea (T). If we talk about understanding O we most probably mean the subject being able to justify each step of the proof. Understanding T can refer to one or more different objectives of understanding, as there are among others:

  1. TE:

    interpretation of the referring expressions in the proof and relating the meaning to the knowledge of the required areas of mathematical objects and their relations,

  2. TJ:

    understanding the justification of the proof steps presented in T,

  3. TB:

    the bridging of deductive gaps in T,

  4. TR:

    recognition of the proof method,

  5. TC:

    understanding the choice(s) of the way of proving in T among possible alternatives.

It should be clear that these objectives of understanding could be segmented in other ways than presented here, and that they do not occur isolated from each other.

We have left out from the above list levels of understanding going beyond the text-immanent level. Understanding could also mean understanding the author’s motivation of the choices made in the proof, linking to a certain historical context, or to a context of specific foundations, relations to other texts etc. Although we believe that these levels of understanding play an essential role in the communication of experts in mathematics, we will limit our considerations to context-independent, mostly text-immanent, levels here, because they are central for the teaching of mathematics and for the concept characterized by Avigad’s criteria as discussed in Sect. 7.

Carl (2019) draws a distinction between capturing the proof structure and comprehending the proof steps in such a way that the reader is convinced. Capturing the proof structure is closely related to understanding T, while comprehending the steps is closely related to understanding O.

Like in other text genres understanding can be oriented to text-immanent features of structure and meaning, or to the writer’s intention. In this case, understanding need not even include the validity of the proof, although the use of the term proof presupposes validity. “This is a proof of the infinity of the prime numbers” usually means that a valid proof is being presented. But like other presuppositions the validity presupposition of the term proof can be cancelled, e.g. if we talk about an erroneous proof, or a proof which later turned out to be wrong etc.

Following Avigad in the spirit of the late Wittgenstein, we will not focus on the nature of understanding as a certain subjective state, but look at criteria for the ascription of understanding to others. Avigad bases his criteria on testable abilities.

Understanding a narrative involves abstracting from the way of presentation and going back to the story (in the sense of Schmid, see Sect. 4) itself. This enables the recipient to identify the story in different presentations (e.g. in text, drama or film, in different detailedness, from different perspectives etc.). The recipient should be able to recognize motivations of the agents, to reason about the effect of counterfactual changes in the narrative, and to explain why certain facts were included in the narrative. The recipient should be able to distinguish between common and uncommon events. One can argue that on an expert level understanding also includes the ability to talk about dramaturgical aspects of the narrative and its presentation and to draw intertextual links to other texts.

In terms of frames, the abilities mentioned above include that the recipient must be capable of a flexible handling of the frames and their fillers, of disregarding certain frames and varying fillers. E.g. in order to reconstruct the story from the presentation or to identify the common story of two texts, a recipient has to abstract from frames belonging to the structuring of the narrative (comparable to some of the structural frames in our concept of proofs). This structure includes, e.g. in a dramatic presentation the segmentation into acts and scenes, in a ballad the effects of metre and rhyme, in a novel non-linear interleaving of plot lines.

Understanding in this sense does not only mean to know the network of frames which structures the object of understanding, but, moreover, it involves knowing which frames are essential (for which purpose). The criteria above for story understanding and Avigad’s criteria for proof understanding amount to more or less creative tasks.

4 Analytic Levels

Since the structuralist period and the works of Todorov and Genette, scholars distinguish at least two levels of analysis with respect to narrative text, at least one of which pertains to what happens in the narrative and one of which relates to the form of presentation. Different aspects of both have been grouped into different theoretical layers. In part V of Schmid (2008, 2010), Schmid presents his “Ideal genetic model of narrative constitution” as a solution to integrate the different notions.Footnote 6

While Schmid’s model is not designed to be applied to mathematical texts, we feel that similar distinctions of textual and semantic layers may be helpful in explaining how mathematical proofs are (written and) understood by readers. In mathematical texts, such a layered approach helps to clarify the role of structural frames. As they relate the textual form to the logical structure of the argument, they allow for the selection and completion of information in proofs.Footnote 7

The original model distinguishes four levels of a narrative that differ with respect to abstraction, from a part of the ‘world’ of the story towards the form of the text. The following characterization of levels is ours, but based on Schmid’s. The first tier is constituted by the mathematical objects, their properties and relations, as well as mathematical facts and their logical interdependencies.Footnote 8 For example, the happenings of a certain proof of Pythagoras’ Theorem would be the mathematical objects and facts related to Pythagoras’ Theorem. To integrate the proof into her mathematical knowledge, the reader must reconstruct this level from a proof presentation such as a mathematical text, which constitutes the last layer.

The intermediate levels in Schmid’s model are as follows: From the initial level, one constructs a story (Geschichte), which contains, according to Schmid, selected elements and properties. A story could be, in our understanding, a specific proof of Pythagoras’ theorem. It is arranged in a certain linearization and arranged in a specific order to yield a narrative (Erzählung). This is then presented in a specific linguistic (or symbolic) way, for instance as a mathematical text, or in the form of code for an automatic proof assistant or prover.

We can connect this model to the two kinds of frames discussed above: To understand the proof, a reader of the proof will have to reconstruct the story (and, maybe, the happenings). We hypothesize that readers reconstruct which structural frames are used, and fill in the roles of elements accordingly. In this sense, frames provide strategies for/in understanding proof texts. The levels of understanding that are involved here are at least TE, TJ, TB, and TR. The structural frames may be understood as patterns used in narratives or proofs to shorten (and unfold) presentations. Ontological frames concern the theory, i.e. the (static) background knowledge about relationships, especially between the entities in the story. It is evident that structural frames presuppose some ontological states which allow to apply them, and these can be modelled in (ontological) frames, as well – this is frame interaction. Structural frames then allow the writer of a proof to select specific information: Although we often present frame constituents without the information to which extent they are core, this gradual importance to the frame is essential in crafting a good proof. The more advanced the readers, the less information one needs to provide, and (besides deviation from default values) coreness is – by hypothesis of the frame model – an important criterion for the selection.Footnote 9

It is evident from the above that an important aspect in a narrative analysis is the aspect of time, both in the original happenings and in the subsequent levels. In this regard, it is interesting that when thinking about proofs, we often impose a (relative) chronology of proof steps. We can distinguish at least two aspects of temporality:

First, a standard example in linguistic textbooks for the timeless reading of the present tense are mathematical statements like “two times three equals six”. Similarly, we will also interpret statements about deduction relations in proofs (“p implies q”) as timeless. Nevertheless, proof texts make essential use of temporal-deictic expressions including grammatical tenses other than the present tense (e.g., “as we have shown”, “as we will see”). Superficially, the temporal deixis can be viewed as a form of text-internal deixis pointing to the previous and following parts of the text. But it is possible to reason about hypothetical parts of a proof (“if we had assumed that p, we would have to show that q”); one can can argue that this indicates a certain iconic relation between the textual order and a temporal representation of the proof structure. Temporarily odered parts of a proof text need not be given explicitly but can be indicated (“these steps are continued for...” ). More precisely, we should speak about a time-metaphorical representation of the proof structure. Proof steps need not necessarily be concetualized as totally ordered. Even transfinite numbers of steps can be inserted between two given proof steps by an indication (“we repeat this for all natural numbers \(i\)”).

Secondly, Ganesalingam (2010) observes that a notion of time is underlying mathematical texts in still another sense. Certain notions and notations can change their meaning depending on the context. This happens if we extend an arithmetic operation like multiplication from natural numbers to integers, to rational numbers etc. Formally, we could consider the multiplication in different number systems just as different operations, but many mathematical texts seem to adopt a view that one operation is extended along with the progress of the number systems, i.e. a change of a notion in time.

In any case, the temporal order is normally not a full order, and some may claim it is metaphorical (see also Boolos 1971; Lakoff and Núñez 2000). Moreover, the metaphor may not be maintained through the same proof. Still, it is important to note that the assumed order may be different from the order in textual presentation, and the textual descriptions may also talk about infinite sequences of proof steps,Footnote 10 thus paralleling the destinction between time in the narrative happenings and time in the text (see, e.g., the discussion by Martínez and Scheffel 2020, pp. 33ff), but also non-narrative elements (see, e.g. Bal 2009, §§3, 4), which are common in narratology.

Using the levels of proof inspired by narratology, we can thus analyse in an easier and more fine-grained way how and where the textual presentation of the proof differs from a fully explicit story (or even the happenings), the more advanced the audience of a given presentation is.

5 Case Study: Extremal Proofs

In this section we will consider how this model can be aligned with actual mathematical practice, mainly mathematical problem solving. To do this we will study a specific proof technique, namely the use of the extremal principle as taught in Engel (1999). The book is influential and still nearly required literature for all students participating in the International Mathematical Olympiads (IMO).Footnote 11

We also consider Carl’s (2017) chapter on extremal proofs, which contains a little more general reflection on the proof techniques. We will first consider how we learn structural frames in the first place.

We assume that we first internalize proof frames as prototypical blueprints for a proof. This enables us to understand proofs and produce proofs on our own, when we can recognize a proof frame. Note that we could talk here also about understanding of something (e.g. proof techniques) but we do not want to focus on this kind of understanding but on how our internalization of frames could help us to understand proof texts.

But let us now start with the actual case study and examine Engel (1999) and Carl (2017). The structure of Engel’s chapters is as follows: Beside some introductory text we get a collection of problems all evoking one particular theme. The first problems are directly followed by a proof. A larger collection of exercises follows, which also have a solution attached, but the reader is expected to solve them on her own, as much as possible. The chapter on the extremal principle begins as follows:

In this chapter we discuss the extremal principle, which has truly universal applicability, but is not so easy to recognize, and therefore must be trained. [...] We are trying to prove the existence of an object with certain properties. The extremal principle tells us to pick an object which maximizes or minimizes some function. The resulting object is then shown to have the desired property by showing a slight perturbation (variation) would further increase or decrease the given function. [...] We will learn the use of the extremal principle by solving 17 examples from geometry, graph theory, combinatorics, and number theory [...] (Engel 1999, p. 39)

We first see that knowing the problem field we are currently navigating through gives us clues what we would expect, this is closely related to the frame interaction mentioned in Section 2 and in Fisseni et al. (2019). We read: “Das Extremalprinzip setzt also einen Kontext voraus, in dem minimale oder maximale Objekte existieren”Footnote 12 (Carl 2017, p. 75) Speaking about minimal or maximal objects thus presupposes a scale of measurement.

As Carl stresses, there are different principles that give us such existence, and Engel mentions “three well-known facts” we need to know in this context. Relative to the domain we get that the domain of natural numbers triggers in the infinite case the least number principle and in the finite case the existence of a maximal or minimal element. Closely related is also the nonexistence of infinite decreasing sequences in well-ordered domains. The domain subset of reals triggers the least upper bound principle or largest lower bound principle.

Carl (2017, p. 75) explains that prototypical extremal arguments are different depending on the hypothesis:

Beweise mithilfe des Extremalprinzips funktionieren meist auf eine der beiden folgenden Weisen:

  1. 1.

    Zu zeigen ist eine Existenzaussage. Das extremale Objekt ist ein Beispiel für ein Objekt der gesuchten Art oder hilft bei dessen Konstruktion.

  2. 2.

    Zu zeigen ist eine Allaussage. Man nimmt das Gegenteil an, betrachtet ein extremales Gegenbeispiel und arbeitet auf einen Widerspruch (meist zur Maximalität oder Minimalität) hin.Footnote 13 (Carl 2017, p. 75)

This already gives us something as a (not fully formalized) basic idea that the frame contains the following slots:

Scale::

How are we measuring it

Kind of extremality::

Is it minimal or maximal?

Principle evoked for existential claim about extremal object::

Least upper bound, least number principle ...

A more detailed and formalized version is shown in Fig. 3. We consider frames as being structured in a type hierarchy. Subtypes of extremality proofs are formalized as subtypes (i.e. more special frames) of the extremality frame. Some of the subtypes are given in Fig. 2, and some of these are formalized in Figs. 4, 5, 6, 7 and 8.

Fig. 2
figure 2

Partial type hierarchy of extremality frames

Fig. 3
figure 3

Basic type of the extremality frame

Fig. 4
figure 4

Minimality frame

Fig. 5
figure 5

Maximality frame

Fig. 6
figure 6

Extremal contradiction frame

Fig. 7
figure 7

Extremality frame on natural numbers. \((?!)\)  marks default fillers of a slot

Fig. 8
figure 8

Minimal contradiction frame on natural numbers

Now consider one proof problem and the example solution. Problem E9 (Engel 1999, p. 43):

There is no quadruple of positive integers (xyzu) satisfying

$$\begin{aligned} x^2 + y^2 = 3(z^2 + u^2). \end{aligned}$$
Fig. 9
figure 9

Inferences based on the type hierarchy and the frames in Figs. 2, 3, 4, 5, 6, 7 and 8

Fig. 10
figure 10

Frame instance resulting from the inferences in Fig. 9

As learners trying to solve this problem from Engel’s book we could adopt the following strategy: The already internalized rules of thumb give us defaults for some slots. The inference steps referenced by Greek letters are shown in Fig. 9. We are talking about integers, so we will evoke something like the least number principle \((\alpha )\). We also must show a general statement (negation of existence), so we expect a proof by contradiction \((\beta )\). So we will look for a minimal such quadruple. To find the metric according to which we choose a minimal solution is now the main non-mechanic task. But we are still in the part of the chapter where we learn the frame, so we consult Engel’s solution presented right after the question, and see that we should assume that there exists at least one such quadruple and choose one with minimal \(x^2+y^2\) \((\gamma )\), say (abcd) \((\delta )\). The resulting frame instance is shown in Fig. 10. We then realize that this tells us that \(a^2 + b^2\) is divisible by 3, thus a and b are divisible by 3 (just look at the corresponding remnants modulo 3, which only fits in this case). Thus, we could rephrase the equation as:

$$\begin{aligned} 9m^2 + 9n^2 = 3(c^2 + d^2) \end{aligned}$$

This can be recalculated to

$$\begin{aligned} c^2 + d^2 = 3(m^2 + n^2), \end{aligned}$$

contradicting the minimality.

Now consider another example:

\((2n+1)\) persons are placed in the plane so that their mutual distances are different. Then everybody shoots his nearest neighbor. Prove that (a) at least one person survives; [...] (Engel 1999, p. 48)

Since the distance is real, we could look for minimal and maximal distances. The main insight to solve (a) is to look at the two people who are nearest at each other. They will shoot each other. If somebody else is shooting one of them, one person gets at least two bullets, and we are done. If not we can remove the two people from the picture and repeat the argument. As we are an uneven amount of people this has to end eventually leaving one person in the worst case.

When reading these sketches of the proofs, it seems that we mainly give those fillers of slots that are not already clear from the context. As we cannot expect the reader of this paper to have internalized a frame for extremal proof, we made the other points explicit as well, but the solution for the first problem between students with some training could be as simple as: “Prove it by the extremal principle and minimize \(x^2+y^2\).” The second problem could be something like: “Always consider the two people closest to each other and eliminate such pairs inductively”, evoking another proof frame, namely induction, as well. Sometimes this mere skeleton of the proof is the most essential part.

Actually in some examples communicating the details might be confusing. Take for instance a proof of the fundamental theorem of algebra. Argand did this by the extremal principle in 1814 (cf. Dawson and John 2015). The fundamental theorem says that a non-constant polynomial p(x) has a zero. We look at the polynomial within a sufficiently large compact disk around the origin. The absolute value must attain a minimum, say at \(x_0\). When we slightly vary around that minimum, the change in the value of \(p(x_0)\) gets mainly dominated by the least power of x with a non-zero coefficient. This can be done in a way that \(\vert p(x_0 + \delta ) \vert \) is smaller than \( \vert p(x_0) \vert \), but this calculation is tedious and it hides the main idea that we work in an extremal setting. So, the full presentation of the calculation might lead a reader to focus on the fact that it yields a contradiction but hides the starting point of our thoughts, namely the idea of using an extremal proof.

The expectation evoked by the identification of a frame can be used as a heuristic tool. Let us consider an extremal proof of a negative result in graph theory. The graph frame has slots, among others, for the numbers of vertices and edges, the treewidth, the independence number as well as the diameter. These slots are filled by natural numbers, thus evoking the natural number frame. This is an ontological frame, which in turn evokes certain proof frames like induction and the least number principle. In interaction with the extremal frame, it evokes the proof frame of the least number principle, because the least number with a given property is an extremal object. So we would first try to make an (assumed) minimal example even smaller, which would constitute a contradiction. But this expectation of minimalizability driven by the natural number frame can be withdrawn. After problems with this approach (or due to experience with other proofs), some graph parameters might become salient, like the number of edges in a given graph. But this would open up the possibility of a contradiction with the maximal numbers of edges, or more often against a common approximation by those, namely \(n^2\). The dynamics of the interaction of the domain of discourse with the proof frames can be made more concrete in a frame approach, and it can be implemented and used in a translation from frames to automatical theorem proving.

Frame-driven expectations help while reading a mathematical proof, as it is common to try to construct a proof oneself while reading a text. Actually the many gaps common in (advanced) texts can often not be resolved without such expectations. As mentioned in the introduction, frames explain how we fill in gaps. It is an empirical question whether the gap filling process is correctly predicted by frames. Some very first evidence is delivered by Fisseni et al. (2019) where, when told to rephrase a proof for first-year students, two mathematicians made explicit exactly the hypothesized frame slots.

This idea of gaps is actually epistemologically significant. We mentioned the derivation-indicator view, explaining the convergence of mathematical results by hinting at underlying derivations. Frames could empower this idea as a bridge between informal and formal proofs. But a frame approach could also go the other direction. As the frames provide a high-level view of the proof, understanding a proof on a frame level allows to compensate for mistakes in details and allows to distinguish minor mistakes from fundamental mistakes in a proof. Moreover, expert reviewers check proofs on various levels, subsuming under familiar frames, not necessarily going all the way “down” to the details. Frames thus allow to explicate the concept of “the basic idea” of a certain proof. This hints towards another philosophical debate, namely the identity of proofs.

The explanatory power of the frame approach in this area is wider than strictly epistemic questions. We could in principle tackle question of style. What is an allowable gap? Judgements on the allowability of gaps are of course highly context-dependent, because the allowability of gaps depends on the intended readership. Our frame-based prediction for when a gap is judged allowable is as follows: A gap is allowable when a typical member of the intended readership is able to fill in the gap by using a frame or multiple interacting frames, where all applied frames have to be already internalized by the typical member of the intended readership, and for all slots that get filled with a non-default value, the slot has been made salient through explicit mention or appropriate frame interaction. The expected context-dependence of such judgments is now captured by the dependence of this prediction on the choice of the intended readership.

This makes apparent why frames could work as a unifying tool to model mathematical understanding of proof text close to the actual practice and (in principle) in a way that could yield empirically testable prediction or in a implementable way for automated/interactive theorem proving.

We could also extend this to other techniques. The chapter of Engel (1999) that follows the chapter on extremal proofs discusses the box principle, also called pigeonhole principle. Here the important slot of the frame is to be filled with the (class of) objects that serve as boxes. When we prove that in a school with 500 children there is at least one day that is more than one child’s birthday, the boxes are apparently the 366 days of a (leap) year. When proving a more complex proposition, the idea how to find the right ‘boxes’ might be the only thing required by a mathematician who is well aware about the box principle. During the first semester of the university studies of a mathematician, they encounter several new such proof frames. Most of them will be shown to them first in greater detail, later in less detail. The students then apply the learned examples to slightly different tasks. If all goes well, they later realize how to transfer and adapt the learned techniques to a novel context.

6 Frames and Understanding

Even though the previous section only sketched the use of frames with two example problems, we have already observed several respects to what extent this approach explains different objectives of understanding. The recognition of proof methods (TR) is closely related to frames. We assume that such a proof method frame is activated by different linguistic and symbolic cues. Induction is an example of a frame which is actually very structured. Phrases like “base case” or “induction step” – or more subtly evoking an ontological frame that can function as an induction domain, such as natural numbers or recursively defined formulas – may trigger this frame, i.e. make it obvious that we are in an induction proof. In some sense, this already automatizes different steps in TC, the choice of the way of proving a given statement. We saw in the case of extremal proof that the domain influences the choice of specific fillers for given slots. An explanation using frames is more difficult when the choices are between completely different techniques, like why we used the extremal principle but not a proof by ‘diagram chasing’. The frame offers us referents and structure that do not need to be made explicit within the proof text. Assume we work in an extremal setting of a general assumption. The phrase “contradiction” would already signal the extremality of the example, without need to stress this for a reader aware of the extremal frame. In this sense frames also help to bridge deductive gaps (TB). The question of the justification of the proof (TJ) steps presented in a proof text depends on our concept of the internalization of the frames. If we agree that the internalization of the frames contains an understanding why these proof techniques actually work, then we would be able to justify the steps. Identification of frames alone does not suffice, as it need not mean that the individual has actually been convinced. We could imagine that even a computer could do this.

More generally the evocation of the adequate frame is in part already understanding of a proof in the sense of the subsumption under a more general structure. I am understanding that a given proof is one of a definite kind. By doing so I am embedding the proof into my background knowledge. A trained mathematician might have more knowledge about such kind of proofs. We would also say that this high-level understanding becomes more important the more one masters a field. A more experienced researcher can grasp a lot of a paper just by skimming it. Similarly, an experienced researcher can communicate a lot of information in a few short sentences triggering the right background structures in their counterpart. This is an important point: Subsuming under a more general and more familiar structure helps to find connection to our background knowledge. It also is related to several of the indicators discussed by Avigad, as we will see in the next subsection.

One might be tempted to argue that we only focus on proofs as we find them in our mathematical training, but we would disagree. Apparently, it is not obvious that very complex proofs can meaningfully be translated as iterated evoking of frames. However, though each new breakthrough may eventually constitute a new frame, it is crucial to see that a big part of the research output is not solely creative work, but the usage of well-known methods. This might not work out straightforwardly and methods might need to be adapted to fit. Consider for instance the introduction of a thesis on Flag Algebras in Extremal Graph Theory, a tool that can be adapted to solve a lot of problems that seemed to be out of reach:

The aim of this thesis is to present new method based on algebraic and analytic tools – the celebrated method of flag algebras invented by Razborov [67]. This method provides a uniform framework for standard counting techniques used in extremal combinatorics. It is inspired by the theory of dense graph limits, on which we focus in Chapter 4. Despite the fact that the method is quite new, it has been successfully applied to various problems in extremal combinatorics, giving solutions to many long open-standing problems. In particular in Turán-type problems in graphs [23, 35, 39, 41, 61, 63, 64, 70, 74, 76], 3-graphs [7, 27, 28, 32, 62, 69], and hypercubes [5, 8], Caccetta-Häggkvist conjecture [42, 71], extremal problems in a colored setting [6, 22, 38, 50], and in geometry [51]. More details on these applications can be found in a recent survey of Razborov [68]. (Grzesik 2014, p. 2)

Actually a large part of research is extracting methods from new proofs and putting them into a more canonical setting. When such a new proof and the techniques developed to achieve it are well understood and we pass on the knowledge to further generations of mathematicians, this may form a new frame and that could even be something different than the methods used in the original groundbreaking result. To give another example: Moore (1987) analyses the history of the forcing method and we can see how, while Cohen originally developed this method to prove the independence of the Continuum Hypothesis from ZFC, this technique was further developed by Feferman, Levy, Scott, and especially Solovay. This work turned forcing to an accessible technique.

7 Comparing to Avigad’s (2008) Criteria of Understanding Proofs

When studying the question of how the ability to identify frames is connected to proof understanding, it is inevitable to consider the question what it means to understand a proof. In this section, we will consider a partial approach to this question by Avigad (2008), who gave several examples of abilities that someone who understands a proof should have. We do not regard this list as exhaustive, nor do we subscribe to the view that proof understanding can be completely operationalized by such a list (we point out that Avigad does not make this claim, either); however, we certainly agree that the points below belong to proof understanding. We will now go through Avigad’s points separately, commenting in each case on whether and how the ability to identify frames in a proof text is relevant for them.

In the following, the text in italics is always an ability that Avigad (2008) proposes as relevant for ascribing understanding, while the roman text presents our discussion of how the ability in question can be explicated in terms of the frame approach and in terms of the multi-level approach from narrative analysis. The abbreviations in parentheses relate these abilities to the objectives of understanding introduced in Sect. 3.

  1. 1.

    the ability to respond to challenges as to the correctness of the proof, and fill in details and justify inferences at a skeptic’s request (TB, TJ)

    Typical methodological frames, like induction, the extremal principle, the pigeonhole principle etc. work as proof templates: When one has understood why the proof method represented by the frame does indeed yield the supposed result, explaining proof steps becomes a special case thereof. For example, the classical ‘domino’ picture of induction can help in justifying an inductive proof. Moreover, the knowledge of frames may help with “debugging” in the case of disagreement about a proof, i.e. in pinning down the precise point where the objection lies. If someone doubts the correctness of the infinite descent argument for the nonsolvability of \(x^2+y^2=3(z^2+u^2)\) explained above, not seeing at all why this argument yields the desired conclusion (which, as teaching experience shows, happens rather frequently), the structured understanding of the proof that is represented by frames would suggest at least five more specific points, namely (i) proof by contradiction in general, (ii) the least number principle in the positive integers, (iii) the combination of (i) and (ii) in the method of infinite descend and (iv) the specific term manipulations used in this case and (v) the specific number-theoretical background required here, such as calculation with residues. Here, (i), (iii) and (iv) would be structural frames, while (ii) and (v) are ontological frames. Thus, at least in some cases, being able to identify frames allows one to respond to critical challenges.

    However, identifying a proof method in itself is certainly not sufficient for being able to fill in gaps. Thus, frame identification is certainly related to, but not sufficient for, responding to criticism.

    Challenges of correctness are analogous to investigating the internal causal coherence of a fictional narrative: For example, when asked why Odysseus had such great difficulties returning home, one would point out that Poseidon was angry at Odysseus because he had blinded Poseidon’s son. Both tasks demand abstracting from the concrete presentation level of the text and reconstructing the story selected by the author of the text. In case of proofs, this encompasses understanding the (causal) justifications for the various proof steps.

  2. 2.

    the ability to give a high-level outline, or overview of the proof (TR, TC)

    Here, the connection to frames is quite obvious: First, giving the overall strategy like “by contradiction/induction/infinite descent/...” itself is part of giving an overview; moreover, the crucial parts can be expected to occur at the places where the slots are filled. In the other direction, the knowledge of the required frames must be expected on the part of the addressee of such an overview in order to make sense of the outline. Several examples for this have been provided in Sect. 5 above, one of which we recall here: Given the extremal frame, the proof for the nonsolvability of \(x^2+y^2=3(z^2+u^2)\) could be summarized as “Pick a solution with \(x^2+y^2\) minimal and show that everything is divisible by 3, so that there is a smaller one.” To makes sense of this outline, one must first identify the approach to pick an example of the type of object whose nonexistence is to be shown as a hint that a proof by contradiction is coming. One will now read the rest of the outline as a description of how to work towards a contradiction. Now, “with \(x^2+y^2\)” minimal and “so that there is a smaller one” specify where the contradiction is supposed to lie, which specifies that the proof by contradiction is one by the extremal principle and fills is many of the slots of the extremal frame shown above: The underlying class is the positive integers, the scale is \(x^2+y^2\), the kind of extremality is minimality, the proof type is contradiction, the assertion is universal. Now, “pick” presupposes the existence of such an object, which is dissolved since, within the ontological frame of the positive integers, the least number principle is available. At this point, knowledge of the extremal frame allows one to understand that “there is a smaller one” means “there is a solution for which \(x^2+y^2\) is smaller”. This leaves “show that everything is divisible by 3” as a sketch of how the existence of such a solution is to be deduced. Here, we see quite well how frames allow us to model rather precisely the background knowledge necessary for communicating such “outlines”.

    In terms of representational levels, this point is ambiguous between an outline of the proof presentation (text) and the proof story. Both can be relevant, and would correspond to identifying structural frames in writing and arranging (such as the way to write up induction proofs; narrative levels of story and narrative) and ontological frames (such as induction), respectively.

  3. 3.

    the ability to cast the proof in different terms, say, eliminating or adding abstract terminology (TJ, TB, TR, TC)

    The reformulation in terms of abstract terminology, for example, the reformulation of a number-theoretical proof in terms of the theory of Euclidean rings, is often a prototypical example of subsuming given concepts under a more general frame. Here, a good knowledge of the relevant ontological frames and their respective relations is clearly helpful. In fact, a frame explicates not only the proof structure, but also the interaction of various proof steps with the features of the occuring ontological frames, such as the least number principle in the case of extremal arguments. Thus, understanding the extremal frame enables one to see rather clearly the assumptions on which, e.g., the nonsolvability proof mentioned above is based.

    This ability relates to the narrative levels in the sense that we give a different presentation of the same story. Abstract or just different terminology is a well-known stylistic choice (“the hero”, “the cunning destroyer of cities”, “Odysseus”), but knowing which terms may be applied demands knowledge of the story and the overall world in which the story takes place.

  4. 4.

    the ability to indicate ‘key’ or novel points in the argument, and separate them from the steps that are ‘straightforward’ (TR, TC)

    As mentioned above, “key points” will often correspond to the slots in a frame: for instance, assumption and contradictory conclusion are key points in a proof by contradiction, base case and inductive step are key points in an induction proof. In other cases, the use of a certain frame is itself a “key point” in the argument. However, heuristical “key points” will also be such points that are exactly not motivated by the frame itself, but in some way ‘original’ or surprising. Given a clear view of a proof in terms of methodological frames, such points will become apparent as that which is not explained by the frame. Thus, in the nonsolvability proof by infinite descent discussed above, it is a “key point” to use infinite descent and to pick \(x^2+y^2\) as the measure to be minimized; the former is the frame used here, the latter a slot in the frame. This then leaves the details of the existence proof for a smaller solution as a further “key point”, which now becomes apparent exactly because it is not inherent in the frame structure. In the case of Engel’s shooting problem, a further “key point” is the interaction of the extremal frame with the induction frame.

    This point connects to the discussion in narrative research whether there must be some point to the story event, or at least the presentation of the proof (see e.g. Hamilton and Breithaupt 2014; Labov 1997; Abbott 2014).

  5. 5.

    the ability to ‘motivate’ the proof, that is, to explain why certain steps are natural, or to be expected (TR)

    Clearly, a methodological frame motivates the introduction of intermediate steps as subgoals. Thus, questions like “Where does this lemma come from?” can often be answered from identifying frames in a proof text. Moreover, methodological frames also motivate the introduction of new objects that often seem to “appear from nowhere” for the uninitiated: This is particularly obvious in the case of the pigeonhole principle, where the given situation needs to be “reframed” in terms of “objects” and “boxes” in such a way that there are fewer boxes than there are objects. For example, when attempting to use the pigeonhole principle to prove that, among \((n+1)\) numbers from \(\{1,2,\ldots ,2n\}\), two are relatively prime,Footnote 14 one will want to form \(\le n\) “boxes” out of the elements of \(\{1,2,\ldots ,2n\}\) such that, in each box, all elements are pairwise relatively prime. This new task is triggered by the pigeonhole principle, and, once formulated, easy to achieve. Another typical example would be the choices of ‘\(\delta \)s” in \(\varepsilon \)-\(\delta \)-proofs of continuity, that often come about as unmotivated complex terms that work for some reason, but can often easily be explained with the help of a few term manipulations when one writes down the properties that it needs to have in order to do its work in such a proof. The same can be said of the extremal proofs discussed above: For example, once infinite descent has been chosen as a proof strategy – which itself is is motivated by the type of assertions for which the frame works – and \(x^2+y^2\) has been chosen as the quantity to be minimized, it is, for example, clear that some kind of work towards a smaller solution must follow.

    This aspect of understanding necessitates understanding the story of the proof. We surmise that the question of potential linearization of the steps, i.e. the level of narrative (intermediate between story and presentation), is also involved here.

  6. 6.

    the ability to give natural examples of the various phenomena described in the proof (TJ, TR, TC)

    This aspect bears little relation to frames. From the point of view of the levels of narratives, one can say that to reconstruct the happenings related to a constructively proved existential statement means to explicitly construct at least one object with the required properties.

    This again necessitates understanding the story of the proof.

  7. 7.

    the ability to indicate where in the proof certain of the theorem’s hypotheses are needed, and, perhaps, to provide counterexamples that show what goes wrong when various hypotheses are omitted (TJ, TB, TR, TC)

    A methodological frame has slots both for assumptions and conclusions, often containing subframes for subgoals that occur in a proof (like the base case and the step in an induction proof). Thus, identifying frames helps in revealing the information flow of a proof. As an example, let us consider the question what goes wrong when the shooting in the example from Engel’s book discussed above takes place between infinitely many people. Note that, in a written version of this proof, the fact that a finite set of real numbers contains its minimum might not be explictly mentioned, so that “going along the text” alone will not necessarily make the difficulty apparent. However, when considering the extremal frame, one will notice that the “boundary slot” can no longer be filled, as no least number principle for infinite sets of real numbers is available. This, then, motivates looking for a counterexample by considering a situation in which no minimal distance exists, which is then easily achieved. (Note that, in this particular case, the least number principle underlies already the formulation of the problem, as among infinitely many people, not everyone might have a “nearest neighbour”). Now suppose that we drop the requirement that the number of people is odd. In the induction part of the proof, we see an induction that proceeds by increasing n by 2. Dropping the oddness assumption, this corresponds to a general induction frame in which there are several base cases and the “step size” is greater than 1. In this induction frame, we will then see that a new base case occurs, namely \(n=2\). We will thus look for counterexamples with 2 persons, at which point it becomes very obvious that in fact any situation with 2 persons is a counterexample.

    From the point of view of the levels of narratives, one can say that if one has succeeded in reconstructing the story intended by the author of the proof text,Footnote 15 then one knows to what extent the claims made in the proof depend on the theorem’s hypotheses and therefore possesses this ability. Note that this is closely related to the narrative-based analysis of the first ability, only that the other direction of logical consequence is made use of.

  8. 8.

    the ability to view the proof in terms of a parallel development, for example, as a generalization or adaptation of a well-known proof of a simpler theorem (TR, TC)

    This, too, lends itself rather easily to an interpretation in terms of frames: Having recognized the frame-structure in two different proofs, it is easy to see that two proofs are structurally identical or to see where and how one proof extends or goes beyond another. This aspect of understanding also necessitates reconstructing the story, and structural frames must be compared and related to each other to discover the similarities.

  9. 9.

    the ability to offer generalizations, or to suggest an interesting weakening of the conclusion that can be obtained with a corresponding weakening of the hypotheses (TJ, TR, TC)

    This is strongly linked to the last point: To the extent that frames help in clarifying the information flow in a proof, it also helps to see what conclusions become unsupported once certain assumptions are dropped or weakened. This aspect of understanding also necessitates reconstructing the story and understanding the underlying representation of the world.

    Both last points, generalization and weakening, may involve detecting structural frames, the ontological frames with which they interact, and the conditions of interaction.

  10. 10.

    the ability to calculate a particular quantity, or to provide an explicit description of an object, whose existence is guaranteed by the theorem (TJ, TR, TC)

    In general, this ability seems to go beyond what can be achieved by a mere knowledge of the occurring frames. When an existence proof is completely constructive such as the proof for the existence of interpolation polynomials, it can be regarded as an algorithm for generating the object in question, and a good grasp of the underlying frames will allow one to instantiate the relevant parameters in the right way. Several layers of sophistication higher, Kreisel’s “unwinding” program for extracting constructive information from proofs, which has been particularly successfully implemented by Kohlenbach (2008), is based on very detailed analysis of the respective proofs. Whether this can be connected to frames is a question that we do not dare to decide at this moment.

    Again, we need the concept of an underlying represenation of the story (and world) of the story of the proof to address this aspect.

  11. 11.

    the ability to provide a diagram representing some of the data in the proof, or to relate the proof to a particular diagram (TJ, TR, TC)

    This seems to bear little relation to frames. One possible connection might be that a mathematical frame might be similar or analogous to a frame known from other contexts, which may help to visualize a situation, the ‘domino’ picture of induction again being a prototypical example.

    This aspect of understanding is a clear example of telling the same story in a different presentation, and comparing presentations. Analogously, one can compare the story and narrative presented in different modalities, such as a film and an epic of the Trojan War. Assuming that a diagram and a certain proof (partly) represent the same narrative (and hence trivially the same story) but differ in presentation, we can then distinguish (structural) frames that relate the story to the narrative from those that allow the presentation, i.e. depiction in a diagram or the verbalization in a proof text. Maintaining a multi-level view of proofs therefore enables a more fine-grained analysis of the relation between proof texts.

Thus, for most of the abilities listed by Avigad (2008), we have been able to explicate them in terms of the frame approach.

Additionally, we have related some of these abilities to the levels of narratives. Using the multi-layered analysis, we noticed that most of them related to the story level, or at least are independent of the narrative and presentation level. This reflects the fact that our and Avigad’s concept of proof understanding is in principle independent of the specific order and form of presentation of mathematical statements. Using several layers of analysis, we can attempt to describe not only that some presentations are to some extent equivalent, but also how and to what extent.

Furthermore, we have listed how these abilities relate to the understanding objectives introduced in Sect. 3. Note that understanding O does not appear among the understanding objectives related to the abilities listed by Avigad. This agrees with the fact that understanding O is an idealized form of understanding that is very hard to operationalize and that is not commonly applied in actual mathematical practice and therefore not presupposed by Avigad’s criteria.

8 Conclusion

In this paper we have made use of the frame concept to shed some light on the nature of mathematical understanding, illustrating the practical applicability of these analytical tools through a case study on extremal proofs, as well as comparing our analysis to the ability-based analysis of proof understanding due to Avigad (2008). We also showed that it is helpful to use a multi-layered analysis of proof structure akin to the levels of analysis applied to narrative texts, especially to understand the difference between structural and ontological frames.

Beyond the applications of the frame concept to proof understanding, this concept can also be utilized to explain some other phenomena of mathematical practice. For example, it is often observed that more experienced mathematicians communicate their proofs more succinctly. Using the frame concept, this can be explained through the fact that they have acquired more frames, allowing them to successfully integrate explicitly provided information with their background knowledge even when the explicitly provided information contains many gaps.

Future work will include formalizing more frames, identifying linguistic and symbolic cues that trigger given frames, and investigating other philosophical topics. The frame concept may shed light on a potentially interesting view on creativity, namely creativity as the new (fruitful) combination of old ideas, or a choice of such a combination among the manifold of all possible combinations. The exploration of further applications of the frame concept for the purpose of analyzing mathematical practice is left to future work.