Proof is fundamental in mathematical practice. In contrast to empirical sciences, mathematical research does not only provide evidence, but generally results in statements that are proven to be true or false in a particular axiomatic system. An understanding of statements and the underlying theories are vital for learning about proof, as Mariotti (2006) emphasizes: “It is not possible to grasp the sense of a mathematical proof without linking it to the other two elements: a statement and overall a theory” (p. 184). The understanding of mathematical statements, in particular their generality, as well as the acceptance and knowledge about the theoretical framework from which the truth of the statement is drawn, can both be seen as necessary prerequisites for students’ understanding and learning about proof (see also Balacheff, 2010).

In this chapter I first explain the meaning of two main types of quantified statements: universal statements and existential statements. Secondly, the meaning of generality in mathematics, other sciences, and everyday life, as well as potential difficulties for students with the concept of mathematical generality in this regard are discussed. While the terms proof, proving, argumentation, and reasoning are widely and often used in mathematics education, there is no clear consensus about their specific meaning and distinction. I therefore review and discuss different notions, usages, and views of these terms as well as the relation between them. Finally, I review different types of argumentations that are commonly being distinguished in mathematics education.

2.1 Mathematical Statements

A mathematical statement is a declarative sentence that is either true or false. Thereby, it is not necessary that the truth value is known yet (or will ever be known). For example, consider the following two statements:

  • Every even number greater than 2 can be represented as the sum of two primes.

  • There is an even number greater than 2 that cannot be represented as the sum of two prime numbers.

Both sentences are either true or false and thus qualify as mathematical statements. The second statement is the logical contradictory statement of the first and therefore, only one of these statements can be true; it is still an open problem which oneFootnote 1 (e.g., Schütte, 1977).

Like the two statements above, most mathematical theorems can take one of the following two formsFootnote 2:

  • For all objects x within a given domain D, property P(x) holds.

  • There is at least one object x within a given domain D, for which a property P(x) holds.

Statements of the first category are called universal statements (sometimes also general statements). They can be written formally, i.e., in first-order logic using the universal quantifier: \(\forall x\in D: P(x)\). Universal statements, when expressed informally (i.e., using natural language), often contain words like (for) all, every, each, always etc. Examples for universal statements include “For all odd numbers a and b, \(a+b\) is even”, “Prime numbers greater than 2 are always odd”, and “Every human is mortal” (even though the latter might not be provable, unless we define humans as mortal). Statements of the second category are called existential statements and can be represented formally with the existential quantifier: \(\exists x\in D: P(x)\). Informally written, existential statements use expressions like there is/exists, at least, (for) some etc. For instance, “There exists a natural number n, such that \(n^2=10\)”, “At least one prime number is even”, and “Some dolphins are pink” are existential statements (and only one of them is false).

There are further subcategories of universal and existential statements. The property P of a universal statement can also involve an existential quantifier and vice versa, as the following examples illustrate:

  • For all integers, there exists an additive inverse.

  • There exists a natural number, such that all natural numbers are greater or equal to that number.

The first statement is an example for a so-called universal existential statement (since the property that holds for all given objects, in this case integers, is about the existence of an object), while the second example is an existential universal statement (e.g., Dubinsky & Yiparaki, 2000; Piatek-Jimenez, 2010). The involvement of existential quantifiers in universal statements (and vice versa) often only becomes apparent when the statement is expressed formally. For example, the universal statement “For all odd numbers a and b, \(a+b\) is even” can formally be written as \((\forall a,b \in \{z\in \mathbb {Z}| \exists k\in \mathbb {Z}: z=2k+1\}) (\exists l\in \mathbb {Z})(a+b=2l)\), and thus, it is more precisely a universal existential statement. Regardless how a statement is expressed, the potential presence of the existential quantifier in a universal statement becomes particularly relevant for proving it; in the example above, because of the definition of even (and odd) numbers:

Since a and b are odd, there exist \(k, m\in \mathbb {Z}\), such that \(a=2k+1\) and \(b=2m+1\). We therefore get: \(a+b=(2k+1)+(2m+1)=2k+2m+1+1=2k+2m+2=2(k+m+1).\) Since \(k+m+1\) is a whole number (thus, we found the required \(l\in \mathbb {Z}\)), \(a+b=2(k+m+1)\) is divisible by 2 and therefore even.

The potential involvement of an existential quantifier in a universal statement (and vice versa) could be a particular obstacle for students in reading (or constructing) proofs, which teachers and lecturers should keep in mind.

The thesis at hand particularly aims at investigating first-year university students’ understanding of the generality of mathematical statements, more precisely, the understanding of the fact that if a universal statement is true, it is true without any exceptions. A universal statement would be disproved (or refuted) by finding just one counterexample. Thus, if someone is completely convinced that a universal statement is true, the person—if having a correct understanding of the generality of statements—should be just as convinced that no counterexample to the statement exists.

This type of generality is characteristic for mathematics. In the following section, its particular meaning in contrast to other disciplines is clarified.

2.2 Generality in Mathematics

The significance of generality for mathematics and science in general was already recognized in ancient Greece. A corresponding criterion for generality can be found in Aristotle’s Posterior Analytics, according to which “any truly scientific demonstration should hold ‘for all’ the entities it concerns” (Rabouin, 2016, p. 113 ). Consequently, progress in mathematics commonly has meant to achieve “ever higher levels of generality” (Chemla, Chorlay, & Rabouin, 2016, p. 3). However, there is no uniform meaning of (mathematical) generality. It can refer to a definition, a theorem, a method, or a type of reasoning, and different mathematicians have used different approaches to achieve different types of generality (Chemla et al., 2016). Since the focus of this thesis is on universal statements, I refer to mathematical generality (in German “Allgemeingültigkeit”) as the property of a statement holding for all objects of a given domain. In most cases, the expression “for all” in a mathematical statement refers to an infinite domain; and, according to Poincaré, the ability to obtain such a generality—with “the power of the mind” (Poincaré, 1952, p. 13), that is by the construction of mathematical concepts and proof—is specific to mathematics (Ly, 2016). However, as stated by Aristotle, generalizing is a goal aimed at in other sciences as well. What might come closest to the form of mathematical, unrestricted generality is that of physical laws, e.g. Newton’s laws. But even these “cannot be demonstrated by conclusive reasoning” (Kneale, 1949, p. 21). The form of generality researchers seek in other sciences (e.g., biology and chemistry) is fundamentally different (Toulmin, 2003).

Biologists, for example, ... may not pursue formulations of unrestricted generality, but they are deeply committed to the search for formulations that we might describe as being of ‘restricted’ generality. Indeed, their science abounds in claims about general properties, and even statements that are often referred to as ‘laws’ .... But there is no domain in which these laws are presumed to be exception-free [emphasis added]. They are generalities, but not unrestricted generalities. It is evident that generality is valued in biology, but exceptions are neither a cause for alarm, nor do they necessarily send researchers back to the drawing board in search of better—exception-free—laws. Rather, they are reminders of how complex biological reality is. (Keller, 2016, p. 474)

Restricted generality is also particularly present in sciences such as sociology, where results are mostly being achieved by providing empirical (sometimes experimental) evidence for them. These conclusions hold in general, usually meaning they “have only high probability” (Kneale, 1949, p. 21). But there can be—and generally are—exceptions.

The assumption that it is commonly considered normal or even expected that there are exceptions to a (general) rule or statement can be further affirmed by the usage of the saying “The exception that proves the rule” (in German: Ausnahmen bestätigen die Regel). It “is often taken in the paradoxical sense of asserting that the presence of a counterexample establishes the general truth of a rule [emphasis added]” (Reid & Knipping, 2010, pp. 25–26), even though this would, in mathematics, disprove a rule or statement. The actual meaning of the expression might, however, be different. Since to “prove” comes from the Latin verb “probare”, which means to test, the expression can be interpreted as “the exception that tests the rule”, which suggests “that examining exceptions closely and reasoning out the way they occur can lead to a clarification and improvement of the rule.” (Reid & Knipping, 2010, p. 26).

In summary, statements that hold exception-free in a(n infinite) domain are unique for mathematics. The fact that universal statements are very rare (or nearly non-existent) outside of mathematics and the common view that there are typically exceptions to a rule, might influence students’ understanding and acceptance of the generality of mathematical statements. Furthermore, for obtaining these types of general results, proof is essential. Understanding this need for proof might not be obvious for students. Because of its essential role and further relevance for this project, I discuss the meaning of proof and its different notions in the following section.

2.3 What is Proof?

In mathematics, statements are only accepted as true after they have been proven in a way that meets certain standards. These standards vary over time and are based on so-called socio-mathematical norms (Dawkins & Weber, 2017; Yackel & Cobb, 1996), as Wilder (1981) emphasizes: “‘Proof’ in mathematics is a culturally determined, relative matter. What constitutes proof for one generation, fails to meet the standards of the next or some later generation” (p. 40). Even though there is no consistent definition of proof, certain characteristics and acceptance criteria can be identified.

In the following, I first outline major historical developments in the context of proof (for a detailed historical overview, see, e.g., Reid & Knipping, 2010). This provides the required background to better understand and reflect the different views and usages of proof in recent literature, which I review in the following section. Furthermore, researchers have highlighted the potential relevance of historical developments of proof for the development of students’ proof conceptions (e.g., Harel & Sowder, 2007). Lastly, main characteristics and acceptance criteria for mathematical proof are summarized and discussed.

2.3.1 Brief Historical Background

The idea to prove a statement—and not just provide evidence for it—originated in ancient Greece around 500 BCE (Reid & Knipping, 2010; Wußing, 2008):

For the early Egyptians, Babylonians, and Chinese, the weight of observational evidence was enough to justify mathematical statements .... But the classical Greek mathematicians found this way of determining mathematical truth or falsehood less than satisfactory (Hanna & Barbeau, 2002, p. 36)

They started to agree on definitions of fundamental ideas and axioms, on which they based their reasoning. In the 4th century BCE, the Greek philosopher Aristotle formulated what we now call the (axiomatic) deductive method in his Posterior Analytics (Anglin, 1994). The deductive method is a process of reasoning where each argument “has to be justified either by an axiom or by a previously proved theorem or by a principle of logic.” (Anglin, 1994, p. 63). The method shaped mathematics significantly, as it has become its defining characteristic (Anglin, 1994; Harrison, 2008; Reid & Knipping, 2010). Euclid’s Elements, which is a structured collection of the fundamental mathematical ideas of that time, is in this regard often considered as the most influential work of mathematical literature (Reid & Knipping, 2010; Wußing, 2008).

European mathematics can mainly be seen as a continuation of the work of the Greeks (Reid & Knipping, 2010). The 17th century CE “saw an explosion of mathematical activity” (Anglin, 1994, p. 161), which led to the discovery of important results, for example, by Newton and Leibniz. However, the methods often did not meet the strict standards for proof (Reid & Knipping, 2010). Many mathematicians at that time were unsatisfied with these methods. But in the case of calculus, it lasted until the 19th century until a foundation based on precise definitions was established (Reid & Knipping, 2010).

In the late 19th and early 20th century, a demand for the formalization of mathematical statements and proofs arose (e.g., Ketelsen, 1994; Reid & Knipping, 2010; Wußing, 2009). A first successful step in this direction was due to Frege’s Begriffsschrift from 1897, in which he developed axiomatic predicate logic (e.g., Sjögren, 2010). Further important developments include: Peano’s Formulaire de Mathématiques, which present a purely formal structure for fundamental parts of mathematics; Russel’s and Whiteheads’s Principia Mathematica, an (unsuccessful) attempt to build a complete foundation of mathematics based on axioms and logical rules of inference; the axiomatic set theory of Zermelo and Fraenkel (which was further extended by v. Neumann); and Hilbert’s Program (Ketelsen, 1994; Wußing, 2009). The latter was intended to completely formalize mathematics in axiomatic form and prove the consistency of this axiomatization (Wußing, 2009). Even though Kurt Gödel’s incompleteness theorems proved the impossibility of this endeavor as a wholeFootnote 3, it had a significant impact on the development of mathematical logic and proof (e.g., Ketelsen, 1994; Wußing, 2009). As a consequence, these developments made it possible—as intended by Hilbert—to formulate and answer metamathematical questions within mathematics itself; new mathematical fields such as proof theory emerged (Rav, 1999; Sjögren, 2010; Zach, 2019). Furthermore, a more formal view and approach to solve mathematical problems has since been established (Ketelsen, 1994). Regarding the future development of proof, Harrison (2008, p. 1395) argues that formalizing mathematics is a “natural further step ... towards greater clarity and precision.”

2.3.2 Different Views and Usages of the Term Proof

As the short historical overview emphasizes, the concept of proof has developed over time (and will most likely do so in the future) and depends on social norms specified by the mathematical community. In this section, two main views of proof which have grown historically are discussed.

One common view about mathematical proof shared by many mathematicians, mathematics teachers, and students is that of a so-called formal proofFootnote 4 in the sense of Hilbert (Lakatos, 1978; Tall et al., 2012; Weber, 2003). In this sense, “a mathematical proof is a formal and logical line of reasoning that begins with a set of axioms and moves through logical steps to a conclusion” (Griffiths, 2000, p. 2). Formal proofs can be expressed using first-order logic (e.g., Rav, 1999) and have the property that no interpretation by the reader is necessary to verify their validity. In fact, it is possible to mechanically check the correctness of the proof via computer programs in finite time (Harrison, 2008; Lakatos, 1978; Rav, 1999). Because of these properties, formal proofs ensure rigor and reliability. They can be viewed as an idealization of proof (Hersh, 1993; Jahnke & Ufer, 2015; Manin, 2010; Sjögren, 2010; Sommerhoff, 2017). However, because of their properties, formal proofs are at the same time incredibly long and very difficult to read, which makes them generally useless for most mathematical fields (CadwalladerOlsker, 2011; Hanna, 1989; Jahnke & Ufer, 2015; Weber, 2003). In fact, most published proofs, for example, in textbooks and articles of mathematical journals, are not purely formal as they are not completely expressed symbolically in first-order logic and not all logical steps have explicitly been checked back to axioms (e.g., Rav, 1999; Tall et al., 2012). Aberdein (2009) further emphasizes that “this [referring to formalizing] is not something that mathematicians routinely do” (p. 1). The main mathematical fields in which formal proofs do play an important role are mathematical logic (specifically proof theory) and foundation of mathematics.

Because the majority of proofs are not (yet) completely formalFootnote 5, Thurston (1991) and other modern mathematicians and mathematics educators argue that it is important to explicitly “distinguish between formal proofs and proofs that mathematicians actually construct” (Weber, 2003, para. 2) and publish. The latter can be described as “social conventions by which mathematicians convince one another of the truth of theorems” (Buss, 1998, p. 2). They “are written in a way to make them easily understood by mathematicians” (Hales, 2008, p. 1371). In contrast to formal proofs, routine steps are omitted in these proofs and readers have to interpret the context and translate intuitive arguments into more rigorous ones (Hales, 2008). There is no clear consensus in the literature whether or not these proofs are or should be formalizable, at least theoretically. Bass (2009) argues that for mathematicians, an argument is convincing if their peer experts feel “empowered [...], given sufficient time, incentive, and resources, to actually construct a formal proof” (p. 3). However, Lakatos (1978) gives an example for a proof of Euler’s theorem on simple polyhedra, about which he states that “there does not seem to be any feasible way to formalize this reasoning” (p. 64). Nevertheless, he is convinced “that mathematicians would accept this as a proof” (Lakatos, 1978, p. 64).

By renouncing rigor and complete formalism, these proofs enable mathematicians to focus on understanding and imparting the underlying mathematical concepts and ideas. However, the degree of (in)formality and the use of intuitive arguments varies a lot in these proofs. For example, proofs in (abstract) algebra are usually more formal than those in geometry or topology, which quite often rely on intuitive arguments (Hales, 2008; Sjögren, 2010).

In the last decades, various terms have been introduced to separate formal proofs and proofs that mathematicians actually construct. For instance, Douek (2007) distinguishes between mathematical proofs and formal proofs. However, the usage of the term mathematical proof is not consistent in the literature, which can be confusing if the context is unclear. In contrast to Douek, Griffiths (2000) means formal proofs when using the term mathematical proof, as already noted above. The term mathematical proof is also often used more generally as a superset, which comprises any kind of proof, thus including formal proofs (e.g., CadwalladerOlsker, 2011; Lakatos, 1978; Sjögren, 2010; Tall et al., 2012).

Many classifications in the literature highlight two main contrasting purposes of formal proofs and proofs mathematicians actually construct: gaining absolute truth vs convincing others and understanding the underlying mathematics. In this sense, Davis, Hersh, and Marchisotto (2012) differentiate between proofs of metamathematics and real mathematics; and Recio and Godino (2001) between foundation of mathematics and mainstream mathematics. Proofs of the latter are sometimes called mainstream (mathematical) proofs (e.g., Harrison, 2008), ordinary (mathematical) proofsFootnote 6 (e.g., Tall et al., 2012) or practical (mathematical) proof (e.g., Hersh, 1997), to emphasize that these are the sort of proofs commonly produced in mathematical practice. Hales (2008) uses the term traditional (mathematical) proofs, which refers to the historical development of proof and the fact that purely formal proofs only recently became more relevant in mathematical research (see Section 2.3.1). To highlight the influence of socio-mathematical norms, Buss (1998) uses the term social proof. However, the usage of this term has not (yet) been established in the literature, especially in mathematics education.

Another way to distinguish ordinary proofs from formal proofs is the use of terms, which refer to a lesser degree of formality. The usage of these terms is not always consistent. In (philosophy of) mathematics literature, ordinary proofs are commonly called informal proofs as the opposite of formal proofs (Dawson, 2006; Marfori, 2010; Sjögren, 2010; Tanswell, 2015), even though ordinary proofs are usually not completely informal. Lakatos (1978) criticizes that often proofs are misleadingly called informal, even though “a competent logician ... can formalize any such proof without too much brain-racking” (p. 63). He suggests to call proofs that are not completely formal, but formalizable formal proofs with gaps or quasi-formal proofs, because they are simply “incomplete formal proofs” (Lakatos, 1978, p. 63). To describe proofs that are “truly informal”, Lakatos uses the term pre-formalFootnote 7. To emphasize a lesser degree, but not complete absence of formality, Reid and Knipping (2010) call ordinary proofs semi-formal.

I understand mathematical proof as a deductive form of argumentation, accepted by the mathematical community, that comprises a spectrum of formality from purely formal to the complete absence of formality, which I refer to as non-formal. I assume non-formal proofs are rather rare and most published mathematical proofs (the ordinary ones) can be assumed to be neither non-formal nor formal but somewhere in between (see Figure 2.1). All these definitions of ordinary proofs are rather broad and vague. While formal proofs can be precisely defined, it is indeed not possible to exactly define what constitutes a valid ordinary proof (Buss, 1998; Davis et al., 2012; Lakatos, 1978; Sjögren, 2010). One reason already mentioned above is the significance of socio-mathematical norms and expectations of the mathematical community for proving practices, which the following well-known quote from Manin (2010) illustrates: “A proof [in the social sense] only becomes a proof after the social act of ‘accepting it as a proof”’ (p. 45). As already noted at the beginning of this chapter, these “standards of acceptability are changeable and subject to different constraints which vary according to different variables” (Mariotti, 2006, p. 176).

Figure 2.1
figure 1

Spectrum of mathematical proof by degree of formality

The lack of a precise definition for (ordinary) proof complicates deriving clear instructional implications for the teaching of proof. However, several acceptance criteria of the mathematical community and characteristics of proofs, which are seen to be useful for the teaching and learning of proof and argumentation, have been discussed in the literature. These are outlined in the following section.

2.3.3 Characteristics and Acceptance Criteria for Proof

Even though a precise definition for (ordinary) proof does not exist, it is important to agree on a conceptualization within mathematics education; otherwise “it is difficult ... to meaningfully build upon each other’s research and it is impossible to judge if pedagogical goals related to proof are achieved” (Weber, 2014, p. 353). However, such an agreement has not been reached yet (Balacheff, 2002; Reid & Knipping, 2010; Weber, 2014). Although there is no uniform conceptualization of proof, a shared understanding seems to be that the definition of formal proof is not very useful for mathematics education (see, e.g., CadwalladerOlsker, 2011; Hanna, 1989; Weber & Czocher, 2019), because (1) the arguments mathematicians refer to as proofs, i.e., ordinary proofs, are in general not formal (with good reasons); and (2) the underlying concepts and ideas we want students to understand are disguised in formal proofs.

In 2007, Stylianides proposed a characterization of proofFootnote 8, which is often cited and used in mathematics education research. He views proof as a mathematical argument containing...

  1. 1.

    ... a set of accepted statements, e.g., definitions, axioms, theorems, etc.;

  2. 2.

    ... valid and known forms of reasoning (which he calls modes of argumentation), e.g., application of logical rules of inferences, use of definitions, construction of counterexamples, etc.;

  3. 3.

    ... appropriate and known forms of expression (which he calls modes of argument representation), e.g., linguistic, physical, pictorial, symbolic, etc.

According to this definition, proof highly depends on the context and individuals, who construct or evaluate the proof, and thus it corresponds to a social view on proof. In particular, the familiarity with different aspects of proof (known forms of reasoning and expressions) seems to be an essential characteristic for the acceptance of proof in Stylianides’ definition.

Weber (2014) views proof slightly differently. He argues that the frequently used approach of defining proof by identifying characteristics that are shared by all proofs, but not by other arguments has been unsuccessful, because there is no “consensus on which ... properties capture the essence of proof” (p. 353). Consequently, he suggests to view proof as a so-called clustered concept consisting of the following seven models: Proof as

  1. 1.

    a convincing argument.

  2. 2.

    a transparent argument where a mathematician can fill in every gap.

  3. 3.

    a deductive argument.

  4. 4.

    a perspicuous argument that provides an understanding of why a theorem is true.

  5. 5.

    an argument within a representation system satisfying communal norms.

  6. 6.

    an argument that has been sanctioned by the mathematical community.

Weber (2014) admits that these features have been stated before, but he claims it is original that according to his approach, none of “these more basic models” (p. 358) can completely characterize proof by themselves. However, he states that “it would be desirable for proofs to satisfy all six criteria”. Furthermore, proofs that fit into all models should not be controversial; but some arguments that only fit into some of the models might either be disputed or can nevertheless qualify as proofs. The models 5. and 6. seem to correspond to the second and third property of the conceptualization given by Stylianides (2007). Further, the conceptualization of Weber (2014) contains several goals (or functions) of proofFootnote 9, in particular proving as convincing someone of the truth of a statement and proving as explaining (to provide insights of why the statement is true), which are both often identified as the main goals of proof (e.g., Hersh, 1993). The latter is often seen as particularly important regarding the teaching of proof and proving in school (e.g., Brunner, 2014; Hanna, 2000). In my understanding, the models are not all independent of each other. For example, models 2. to 6. can be viewed as influencing factors for being a convincing argument (i.e., for model 1).

Even though the two discussed approaches for conceptualizing proof differ, they still contain similar features. For instance, they both refer in some sense to convincing and accepted arguments. What remains unclear in this sense is what arguments should students find convincing or accept as proof? A common approach to answer this question is to investigate if (or to what degree) mathematicians agree on what forms of arguments and representations are valid and appropriate. In this regard, Weber and Czocher (2019) differentiate between two positions: “The consensus view on proof asserts that mathematicians agree on which inferential schemes are permissible in a proof; the pluralistic view holds that mathematicians disagree on which inferential schemes are permissible” (p. 254). Hanna and Jahnke (1996) have argued that acceptance criteria that are shared by all mathematicians do not exist, which several studies confirm (e.g., Inglis & Alcock, 2012; Inglis & Mejía-Ramos, 2013; Weber, 2008, see also Section 3.2.3). However, Weber and Czocher (2019) found that there seem to be “three categories of inferential schemes”: Standard methods in “typical proofs” mathematicians agree on; “invalid schemes” such as empirical argumentsFootnote 10 on which mathematicians also agree; and “controversial schemes whose permissibility is unclear”, for example, computer-assisted proofs and visual arguments (p. 264). Thus, disagreement might almost exclusively occur regarding atypical proofs. Apparently, the familiarity with the mathematical argument is indeed a major factor for its acceptance.

However, apart from the existence of a proof and the acceptance of it, there are other criteria that influence mathematicians’ conviction of the truth of a statement. According to Hanna (1989), mathematicians accept new or unfamiliar theorems by a combination of the following criteria:

  1. 1.

    They understand the theorem, the concepts embodied in it, its logical antecedents, and its implications ...;

  2. 2.

    The theorem is significant enough to have implications in one or more branches of mathematics ...;

  3. 3.

    The theorem is consistent with the body of accepted mathematical results;

  4. 4.

    The author has an unimpeachable reputation as an expert in the subject matter of the theorem;

  5. 5.

    There is a convincing mathematical argument for it (rigorous or otherwise), of a type they have encountered before. (pp. 21–22)

Noteworthy, only the last criterion explicitly refers to proof. Moreover, the first three criteria highlight the importance of understanding the theorem and its implications for the acceptance of it. In line with Stylianides (2007), Hanna (1989) characterizes a convincing argument as one with which the reader is familiar, as she states that it is an argument “they have encountered before” (p. 22), i.e., in other proofs. This is of direct importance for the teaching of proof at the transition from school to university, because students usually do not gain extensive experience with proof and proving during high school (e.g., Hemmi, 2008; Kempen & Biehler, 2019). Thus, they might not have strong conceptions regarding what forms of reasoning and representations are appropriate. Research findings on students’ conviction and acceptance of different types of arguments are reviewed in Section 3.2.3.

The list above also emphasizes that the acceptance of an argument is indeed not a necessity for the acceptance of the truth of statement, but only one component. For instance, the fourth factor introduced by Hanna (1989), namely that of the authors reputation or authority, might also be of particular relevance regarding actual acceptance criteria of students. It can have a convincing power for them, when teachers or textbooks state that some statement is true (e.g., Harel & Sowder, 1998; Tall et al., 2012). Even more, students may ask themselves “why is it necessary to prove something that is known to be true?” (Tall, 1989, p. 29).

It seems that even mathematicians sometimes rely on authoritarian arguments, when estimating the truth of a statement, in particular, if the statement lies outside their field of specialty and high levels of uncertainty regarding a given argument are involved (Harel & Sowder, 1998; Inglis & Mejia Ramos, 2009). However, according to a study conducted by Heinze (2010), the reputation of an author does not often influence mathematician’s conviction of the truth of a theorem (but sometimes it does). However, the theorem being checked and used by “other mathematicians with high standards” or the theorem existing “for a long time and no contradiction has been found” (p. 106) are both criteria mathematicians claim to employ, when deciding if a mathematical statement is true.

The different approaches and criteria reviewed in this and the previous sections illustrate the complexity of proof and the resulting challenges for teaching practices. In regard to the learning of proof and proving, argumentation and reasoning skills are often viewed as essential. Although these terms are used quite extensively in mathematics education, there is no shared understanding of their meaning. In the following section, different usages of these terms as well as their relation to each other and to proving are examined. In this regard, I also discuss several types of arguments that are common in mathematics education.

2.4 Reasoning, Argumentation, and Proving in Mathematics Education

The terms argumentation, reasoning, and proving or proof are widely used in mathematics education literature as well as in (national) curricula (e.g., Kultusministerkonferenz, 2012; National Council of Teachers of Mathematics, 2000). For instance, in the German national curricula (“Bildungsstandards”) for higher secondary schools:

This competence [referring to mathematical argumentation] includes both the development of independent, situation-appropriate mathematical argumentations [emphasis added] and conjectures and the understanding and evaluation of given mathematical statements. The spectrum ranges from simple plausibility arguments to operative reasonings [emphasis added; in German inhaltlich-anschauliche Begründungen] to formal proofs [emphasis added] (Kultusministerkonferenz, 2012, p. 14, translated by the author)

As with the term “proof”, there is no unique definition for the terms (mathematical) argumentation and reasoning, and a shared understanding regarding their relation to each other and to proving has not been reached yet in mathematics education (e.g., Balacheff, 1999; Reid & Knipping, 2010; Stylianides, 2016). In the following, I give a short overview of the main understandings of these terms and their relationship to emphasize differences and to avoid confusion. The terms and definitions on which the framework of this project is based are then clarified.

2.4.1 Definition of and Relation between Reasoning, Argumentation, and Proving

Among the terms reasoning, argumentation, and proving, reasoning may be the term least discussed in mathematics education literature. Two different understandings can nevertheless be identified: reasoning as the most fundamental activity of drawing conclusions (e.g., Duval, 1991) and reasoning as a specific form of argumentation (e.g., Hefendehl-Hebeker & Hußmann, 2003; Reiss, Hellmich, & Thomas, 2002). Following the first view, reasoning does not necessarily have to have the goal to convince someone of the truth of a (controversial) statement. Rather, it can simply be used to provide contextual explanation, e.g., to answer questions like “How have you come to be in a position to speak about [or know] this” (Toulmin, 2003, p. 199). The answer might consist of a biographical reason, for instance, “I know how to make toffee because my mother taught me” (Toulmin, 2003, p. 199), or a reference to authority (e.g., “My teacher said so”).

In contrast, the second view is based on the understanding that argumentation rather than reasoning is the elementary activity of which reasoning is a specific form, namely one that is (logically) consistent (Reiss et al., 2002, p. 51). According to this view, the difference between reasoning and proving—the latter being the process of constructing a mathematical proof (e.g., Douek, 2007)—results from different modes of argumentation and degree of formality (Reiss & Ufer, 2009).

In contrast to reasoning, different views of the concept of argumentation have been discussed in detail (see, e.g., Kirsten, 2021; Mariotti, 2006; Pedemonte, 2007; Reid & Knipping, 2010). A shared understanding seems to be that argumentation (in general, not specifically mathematical) is a discursive activity, usually consisting of a sequence of inferences rather than a single argument (e.g., Douek, 2007; Toulmin, 2003), being used to convince someone of the truth of a statement (Duval, 1999; Krummheuer, 1995).

There are several reasons for differences among researchers regarding their view of the relation between argumentation and proving, including different foci of characteristics of argumentation and different conceptualizations of proof. In this regard, Balacheff (1999) argues that different conceptions of argumentation can either lead to the conclusion of argumentation being an obstacle or a continuous path for the learning of mathematical proof. To “provide a system of benchmark” (Balacheff, 1999, p. 3), Balacheff compares the views of three authors: Perelman, Toulmin, and Ducrot. According to Perelman (1970), argumentation is not mainly about establishing “the validity of a statement” but about “its capacity to convince” (Balacheff, 1999, p. 3) someone. This view may contradict a continuity position of argumentation and proof. Because, even though it is one of its characteristics, proof—particularly in its formal sense—can most certainly not be reduced to just be convincing. On the other hand, in Toulmin’s view, the main characteristic of argumentation is the reliance of its validity on a structure, accepted by a community (Toulmin, 2003). While Toulmin acknowledges different methods of argumentations being used in different fields (e.g., logic) and by “everyday arguers” (Toulmin, 2003, p. 37), his view of argumentation nevertheless contains main characteristics of mathematical proof. Therefore, argumentation and proof could be understood as a continuity, a view shared, for instance, by Boero, Garuti, and Mariotti (1996). Lastly, for Ducrot (1980) argumentation is the core of discourse and connecting words are essential for the imparting of an argument: “The analysis of conjunctions (connecting words) has a particular importance for Ducrot because it is they which make the information contained in a text subject to its global argumentative intention” (Balacheff, 1999, p. 3). Within Ducrot ’s framework, as with Perelman ’s, a continuous view of the relation between argumentation and proof “appears doubtful” (Balacheff, 1999, p. 4). Furthermore, such a conception might lead to the conclusion of argumentation being an obstacle for the learning of proof, as Duval (1991) highlights:

Deductive thinking does not work like argumentation. However these two kinds of reasoning use very similar linguistic forms and propositional connectives. This is one of the main reasons why most of the students do not understand the requirements of mathematical proofs (p. 233)

While acknowledging similarities in linguistic and grammatical aspects, Duval understands argumentation and proof as two forms of reasoning—namely argumentative reasoning and deductive reasoningFootnote 11—but simultaneously as fundamentally separate activities. Regardless of whether or not one shares this view, it should not be dismissed that general argumentation (e.g., in everyday situations) might negatively effect students’ understanding of argumentation in mathematics. For instance, as discussed in Section 2.1, only one counterexample disproves a universal statement. However, in other sciences and in everyday situations, not only would a counterexample not automatically refute the whole statement, it might even be expected that it exists (see Section 2.2).

The notion of mathematical argumentation is widely used in the literature and often includes not only activities regarding the verification of a statement, but a broader spectrum of mathematical activities, such as investigating conjectures and open-ended questions (e.g., Reiss & Ufer, 2009). However, the existence of mathematical argumentation as a particular form of argumentation being unique to mathematics, but not being part of mathematical proof, is seen controversial. It depends on the researchers view of both, argumentation and proof. Someone with a formal view of proof would most likely be able to find argumentations which do not qualify as a part of proving, but that are specific to mathematics, for example, visual arguments that are—as noted in Section 2.3.3—controversial. In contrast, someone with a broader (i.e., social or ordinary) view of mathematical proof might find it more difficult to identify such examples of argumentations. Balacheff (1999) adopts the latter position. He argues “that there is no mathematical argumentation in the frequently suggested sense of an argumentative practice in mathematics which is characterized by the fact that it escapes certain of the constraints present for mathematical proof” (p. 4). This does not imply that argumentation in mathematics does not exist, but these argumentative methods “could be used elsewhere” and would “disappear in the construction of a discourse acceptable with regard to the rules specific to mathematics” (Balacheff, 1999, p. 5). One such prominent example are plausibility arguments, in particular empirical arguments, which are discussed further in the following section.

In this thesis, I follow the understanding that reasoning is a fundamental activity of which argumentation is a specific form. Other than Duval (1991), I do not view proving as being fundamentally different to argumentation, but rather as a subset of argumentation that meets specific criteria (namely those discussed in Section 2.3.3. As Balacheff (1999) and in line with a broader view of mathematical proof, I take the position that mathematical argumentation, as an activity being completely different from proving, does not exist.

Another term that is often used in the context of proof and proving (e.g., Lesseig et al., 2019; Mejía Ramos & Inglis, 2009b; Yackel & Cobb, 1996), but is less discussed in mathematics education literature, is justification or justify (Staples & Conner, 2022). While it is mainly clear what is expected from students when asked to prove a theorem or statement, namely, to construct a(n ordinary) proof (see above), it is less obvious what is exactly meant when students are asked to justify (why) a statement (is true or false) (Dreyfus, 1999). In the context of mathematical argumentation, it usually refers to providing sufficient “mathematical evidence in support of a result” (Staples & Conner, 2022, p. 5). In this thesis, I adopt the following definition given by the National Research Council (2001):

We use justify in the sense of ‘provide sufficient reason for.’ Proof is a form of justification, but not all justifications are proofs. Proofs (both formal and informal) must be logically complete, but a justification may be more telegraphic, merely suggesting the source of the reasoning. (p. 130)

In this broader sense, justification is therefore particularly relevant for the mathematics classroom and can serve as “on-the-way-to-proof reasoning practices” (Staples & Conner, 2022, p. 5, see also Dreyfus (1999)). Similar to proof, what is considered sufficient in this sense needs to be negotiated within the respective community.

In the following section, I describe several types of arguments that are particularly relevant for the teaching and learning of proof and proving in school and that are widely referred to in mathematics education literature.

2.4.2 Types of Arguments

Several argumentation and proof conceptsFootnote 12 have been identified and discussed in mathematics education, specifically in regard to teaching proof and proving in school (e.g., Biehler & Kempen, 2016; Brunner, 2014; Reid & Knipping, 2010). In particular, the following three main types of “proofs” are often distinguished, especially in the German literature:

  • experimental proof

  • operative proof (in German also inhaltlich-anschaulicher Beweis)

  • formal-deductive proof

Reference is usually made to Wittmann and Müller (1988), although they originally cite Branford (1913) (a German translation of the English publication from 1908). Branford (1908) introduces the terms “experimental evidence or proof”, “intuitional evidence or proof” and “scientific evidence or proof” (p. 233).

Not all of these types of arguments classify as mathematical proof, neither in a formal nor in an ordinary view (both as defined in Section 2.3.2). Experimental proof, sometimes also called experimental verification (e.g., Kunimune et al., 2009), refers to a form of argumentation based on empirical evidence for a claim. As such, it cannot establish general truth and is therefore not a valid scheme for mathematical proof, as discussed in Section 2.3.3. Further, this type of argument is not specific to mathematics, as giving examples are a common form of argumentation in everyday situations. Other terms that are used in the literature to describe a form of reasoning where a conclusion is drawn from the observation or verification of a (small) number of cases, are naive empiricism (Balacheff, 1988b) or empirical arguments (e.g., Reid & Knipping, 2010). In this thesis, I use the latter term. Even though empirical arguments do not ensure the non-existence of counterexamples, and thus, the generality of the statement, they are nevertheless and without doubt relevant for mathematical practice, for instances, to examine patterns and to explore or test conjectures. A good overview of several functions of experimentation in mathematics is provided by de Villiers (2010).

Operative proofs are based on specific observations, but in contrast to empirical arguments, they reveal a structure that can be generalized to hold for a whole class of objects (Wittmann & Müller, 1988). In the understanding of Blum and Kirsch (1989), this process of generalizing should consist of correct (non-formal) inferences and the underlying idea of why it is generalizable should be grasped intuitively. The latter emphasizes Branford’s usage and understanding of the term “intuitional proof”. Figure 2.2 provides an example of such a proof for the statement “the sum of any two odd numbers is always even”. The explanation above the figure is not necessarily required. However, some researchers argue that with regard to the generality of the argument, valid explanations should be included; otherwise it could not be classified as proof (e.g., Kempen & Biehler, 2019).

Every odd number can be grouped into pairs (of twos), such that exactly one is left. By adding two odd numbers, one can group the two one’s that are left, such that the sum now only consists of pairs (of twos).

Figure 2.2
figure 2

Example of an operative (or generic) proof

Another term that is widely used in the international literature to describe a similar type of proof, is generic proof (e.g., Bass, 2009; Dreyfus et al., 2012; Rowland, 2001) or generic example and thought experiment (Balacheff, 1988b; Mason & Pimm, 1984). Movshovitz-Hadar and Malek (1998) introduced the term transparent pseudo proof to highlight two main properties: That these types of arguments are not proofs in a formal sense (thus pseudo), but that “[one] can ‘see’ the formal proof through it” (Malek & Movshovitz-Hadar, 2011, p. 37), thus transparent—like glass. Different researchers highlight and define different features of these types of arguments (for more details, see Reid & Knipping, 2010). Regardless of the differences and terms being used, these types of arguments are often seen as an opportunity for the teaching and learning of proof to understand underlying mathematical concepts and ideas without the difficulties of (seemingly) complicated mathematical language:

A generic proof aims to exhibit a complete chain of reasoning from assumptions to conclusion, just as in a general proof; however, ... a generic proof makes the chain of reasoning accessible to students by reducing its level of abstraction; it achieves this by examining an example that makes it possible to exhibit the complete chain of reasoning without the need to use a symbolism that the student might find incomprehensible (Dreyfus et al., 2012, p. 204)

Depending on its specific definition and with respect to a broader understanding of mathematical proof, generic arguments may qualify as valid mathematical proof and could be placed at the non-formal end of the spectrum (see Fig. 2.1 in Section 2.3.2). This would be in line with Wittmann (2014), who states that operative and formal proof (due to his explanations, I assume he actually refers to ordinary proofs) do not differ fundamentally, but only in the form of argumentation and its representation (e.g., one uses iconic, the other mainly symbolic representations). In contrast, in a formal sense, generic arguments would most certainly not qualify as proof. As noted in Section 2.3.3, visual arguments in mathematics (which are often generic) are seen controversial among mathematicians.

The term formal-deductive proof is sometimes used more or less synonymously for formal proof and should as such not be controversial. However, it appears doubtful that Wittmann and Müller (1988) explicitly wanted to refer to formal proof rather than ordinary proof, as they cite Branford (1913), who uses the term “scientific proof”. Wittmann and Müller (1988) thus describe formal-deductive proofs as those proofs that are constructed and published by professional mathematicians. Wittmann and Müller argue that formal(-deductive) proofs are too complex and sophisticated, and therefore an obstacle for the learning of proof and proving in school. Regarding the formal sense of proof, I agree (as most researchers do, see Section 2.3.3). However, not in a broader understanding of proof (i.e., ordinary proof) with its characteristics as discussed in Section 2.3.3. Learning about these characteristics seems to be essential to gain an appropriate understanding of proof and to ease the transition to university.

In the framework of this project, I distinguish between empirical arguments and generic and ordinary proofs, in the understanding discussed above. Further types of arguments that are used by students and are relevant in this thesis (so-called proof schemes) are discussed in Section 3.2.5.