The previous chapter has illustrated how argumentation and proof, and in particular generality, are fundamental for mathematics, but also that defining what constitutes mathematical proof is not that simple. Because of its central role in mathematics, proof and argumentation are seen as essential for the learning of mathematics in school as well as university by many researchers (e.g., Hanna, 2000; Schoenfeld, 2009) and national curriculum frameworks around the world have specified respective learning goals (e.g., Department of Basic Education, 2011; Kultusministerkonferenz, 2012; National Council of Teachers of Mathematics, 2000). Consequently, proof and argumentation have been researched extensively in mathematics education, in particular in the last three decades (Hanna, 2000; Hanna & Knipping, 2020; Sommerhoff, Kollar, & Ufer, 2021). About 16% of all PME research reports published between 2010 and 2014 focused on proof and argumentation in secondary or tertiary education (20% in total, including primary education), as Sommerhoff et al. (2015) report. The majority of these studies is qualitative (57%) and had not more than 100 participants, which highlights the need for further large-scale quantitative studies. Research foci in mathematics education range from theoretical investigations on the importance and relevance of proof for the teaching of mathematics to empirical research on students’ and teachers’ proof skills regarding, for instance, the understanding, evaluation, and construction of proofs (see, e.g., Mejía Ramos & Inglis, 2009a; Sommerhoff et al., 2015). Difficulties with proof and proving related to several activities have been reported by numerous researchers (e.g., Dubinsky & Yiparaki, 2000; Harel & Sowder, 1998; Healy & Hoyles, 2000; Kempen, 2019; Recio & Godino, 2001; Weber, 2001). In particular, more attention has recently been given to empirical research on proof and argumentation in higher education with a focus on the transition from school to university and first-year university students (e.g., Alcock et al., 2015; Gueudet, 2008; Hanna & Knipping, 2020; Kempen & Biehler, 2019; Moore, 1994; Rach & Ufer, 2020; Recio & Godino, 2001; Sommerhoff, 2017; Stylianides & Stylianides, 2009; Stylianou et al., 2006).

With respect to the present research interest, the focus of this chapter is on research findings regarding (first-year) university students and mathematics (preservice) teachers, but wherever appropriate, findings regarding secondary school students are considered as well. Thereby, this chapter is mainly divided into the following three parts. Because of its central role for this thesis, research on the understanding of the generality of mathematical statements and proofs as well as the role of (counter-)examples is first reviewed in section 3.1. In section 3.2, different activities related to proof and proving are then discussed and research findings regarding activities that are relevant for the present thesis are summarized. Lastly, in section 3.3, resources that may influence individual’s performance in proof-related activties are discussed.

3.1 Understanding Generality and the Role of (Counter-)Examples

Generality as an essential characteristic of mathematical statements and proof has been repeatedly highlighted as an important learning goal in the literature on proof and argumentation (e.g., Conner, 2022; Ellis et al., 2012; Fischbein, 1982; Kunimune et al., 2009; Lesseig et al., 2019). To prove the generality of a (universal) statement (as defined in section 2.1), a general deductive argument needs to be constructed. The awareness and understanding of this requirement of proof is usually meant, when researchers refer to understanding the generality of proof (e.g., Conner, 2022). While most likely equally important for the learning of proof and proving, understanding the generality of statements has not been explicitly defined yet. Whenever I refer to the understanding of the generality of statements, I mean the understanding that true universal statements hold for all elements in the given domain–without any exceptions (see also section 2.2), i.e., no counterexamples to the statement exist. Research on students’ (and teachers’) understanding of generality has mainly focused on the generality of proof, occasionally in relation to understanding the generality of statements.

For instance, the framework for students’ understanding of proof by Kunimune et al. (2009) consists of the two aspects construction of proof and generality of proof (see p. 442). According to Kunimune et al. (2009), one necessity for students’ understanding of “generality of proof” is the understanding of the universality and generality of statements. They do not further clarify how the understanding of universality and generality of statements and proof differ. In particular, they do not explicitly define what constitutes understanding of the generality of statements. But they introduce levels of students’ proof understanding. For example, regarding the generality of algebraic proof, students on level 0 “do not understand what they have to explain”. It is unclear, if this corresponds to an insufficient understanding of the generality of a statement. One could argue, however, that students, who are aware that the correctness of a statement has to be explained in some way, can at the same time be unaware what it means that a universal statement is correct, namely, that no counterexamples exist. Those students would then not have a complete understanding of the generality of a statement. The study of Kunimune et al. (2009) consisted of written survey questions, which were answered by 418 lower secondary students (Grade 8 and 9) from Japan. Kunimune et al. (2009) come to the conclusion that participants who were able to construct proofs not necessarily valued the generality of proofs. They seemed to believe that empirical arguments are also an equally acceptable way to prove a statement. In this regard, Stylianides (2016) concludes that students’ “inability to recognize the generality of proof may suggest that [they] do not conceive of proof as a means for establishing truth.” (p. 320). If this is the case, understanding the generality of statements and proof might influence students’ intellectual need (see, e.g., Harel, 2013) and understanding of the necessity for proof.

Lesseig et al. (2019) have also explicitly included the generality of statements and proof in their framework. They investigated preservice secondary mathematics teachers’ understanding of proof. According to their framework for mathematical knowledge for teaching proof, teachers should have the following “essential proof understandings:

  • A theorem has no exceptions [emphasis added]

  • A proof must be general [emphasis added]

  • Proof is based on previously established truth

  • The validity of a proof depends on its logic structure” (p. 396)

The first aspect corresponds to an understanding of the generality of universal statements and the second aspect to the understanding of the generality of proof. In their pilot study, 34 students from the USA, Australia, and Korea completed the survey. Regarding the generality of statements and proof, Lesseig et al. (2019) found that 30% of the participants considered generality when evaluating proofs, where as 20% did so when they were asked to identify requirements of proof. Further, the researchers argue that “merely knowing that a proof must be general was not necessarily sufficient, as [the teachers] had different interpretations of what constituted generality” (p. 413).

As has been pointed out at the beginning of this section, no clear definition for understanding generality, in particular, of statements, and how to assess this understanding has been given yet. Regarding the understanding of generality of proof, Conner (2022) has recently provided a framework to identify students’ (limited) understanding of generality, which she refers to as understanding the generality requirement, meaning “the requirement that a proof must demonstrate the claim to be true for all cases indicated within the domain of the claim” (p. 2). Her framework aims to assess students’ understanding of this requirement in two proof-related activities, students’ construction and evaluation of arguments. Through a case study, she identified instances that demonstrate an understanding of generality and instances that demonstrate limited understanding. For instance, with respect to the evaluation of arguments, according to Conner ’s framework, students who refer to the need for a general argument demonstrate understanding and students, who refer to the inclusion of examples as proof, demonstrate limited understanding. Further, the usage of empirical arguments to prove a universal statement was also identified as an instance, which shows limited understanding of generality. However, several researchers have argued that students’ usage or conviction of empirical arguments does not necessarily relate to an insufficient understanding of generality (e.g., Healy & Hoyles, 2000; Weber, Lew, & Mejía-Ramos, 2020), which Conner herself noted. Students’ conviction and usage of empirical arguments are further discussed in sections 3.2.3 and 3.2.5. Apart from the role of examples in justification and evaluation tasks, Conner (2022, p. 6) also included the usage and interpretation of variables (as “generalized numbers” vs as “placeholders for specific values”), and the notation and interpretation of diagrams (e.g., “representing all possible cases” vs. “showing a specific case”) into her framework. She did not explicitly consider the understanding of the generality of statements.

Most studies that have investigated students’ (or teachers’) understanding of the generality of statements and proof and the respective role and usages of (counter-)examples were mainly qualitative with comparatively small sample sizes. The existing research suggests that some students and teachers do not completely understand the generality of true universal statements and proof (e.g., Balacheff, 1988b; Chazan, 1993; Galbraith, 1981; Knuth, 2002). For instance, Balacheff (1988b) observed 14 pairs of students age 13 to 14 to explore their proving processes. He found that some students understand counterexamples as exceptions from the rule and not necessarily as refutation of the statement, which indicates a limited understanding of the generality of statements. Similarly, in a study conducted by Galbraith (1981) with about 170 students age 12 to 17 from Australia, about 18% of the students thought that one (or few) counterexamples are insufficient to disprove a (universal) statement.

In other studies, participants were confronted with (different types of) arguments and proofs for a statement. Chazan (1993), for example, investigated students beliefs about empirical evidence and deductive proof through semi-structured interviews with 17 high school students from geometry classes. He found that several students were not convinced that a deductive proof ensures that no counterexamples can be found. One student explicitly stated that it is impossible “to prove a statement for everything” (p. 372)–neither with empirical arguments nor with a deductive proof–a belief shared by other students of the study as well. These students seemed to have an insufficient understanding of the generality of proof and potentially of statements. However, some students were (correctly) convinced that a deductive proof guarantees that no counterexamples can exist. Because “a substantial number of students in this study” (p. 382) seem to have beliefs that are contrary to those of the mathematical community, Chazan (1993) emphasizes the importance of discussing the characteristics of empirical arguments and deductive proof explicitly in the mathematics classrooms. However, it is not clear if teachers themselves have a complete understanding of the generality of statements and proofs and their characteristics, as some teachers seem to have similar beliefs to those of students. Knuth (2002) conducted an interview study with 16 secondary school teachers to investigate teachers’ conceptions of proof. His findings suggest that several of the teachers do not have a complete understanding of the generality of statements and proofs. Six of the teachers thought that counterexamples, which would make the proof invalid, could still be found, even though they claimed to have understood (and accepted) the (ordinary) proof. Further, several teachers did not seem to be completely convinced that no counterexamples to a proven statement can exist, as some needed to verify that the argument holds for particular cases. It is not clear if this relates to limited understanding of the generality of proofs, statements, or both.

Even though 72% of the teachers who participated in a study conducted by Barkai et al. (2002) correctly justified the falsity of a universal statement by giving a counterexample, only 36% seemed to think their argument would be accepted as proof. Furthermore, several teachers gave more than one counterexample, which suggests “that they do not believe that a single counterexample is sufficient to refute a universal statement” (Reid & Knipping, 2010, p. 64).

In their influential study, Healy and Hoyles (2000) explicitly considered students’ understanding of the generality of statements. They assessed understanding of the generality of a proven statement by asking students if the proof “automatically held for a given subset of cases” (p. 402) or if a new proof has to be constructed. About 60 % of the students correctly thought that no further proof is needed. In their study, Healy & Hoyles directly related understanding of the generality of universal statements to the generality of proof. Similar to other studies, they did not explicitly define what understanding of the generality of statements specifically consists of.

Buchbinder and Zaslavsky (2019) investigated students’ understanding of the role of examples in proving. Their REP (Roles of Examples in Proving) framework highlights the relationship between (counter-)examples, statements, and proof (Table 2, p. 131). It is based on the logical relationship between examples and statements (see Table 3.1).

Table 3.1 Logical relationship between examples and statements, reprinted from Buchbinder and Zaslavsky (2019, p. 131), with permission from Elsevier

Because existential statements are not considered in the present thesis, only the column regarding universal statements is relevant here. It highlights two aspects of understanding the generality of universal statements in relation to proof: That examples (i.e., empirical arguments) only support the truth of a universal statement, but do not prove it, and that one counterexample is sufficient to refute the statement. Twelve high-attaining Grade 10 students from Israel participated in the study, which consisted of task-based semi-structured interviews. The students were interviewed in pairs “to create opportunities for verbal communication and spontaneous convincing and justifying” (p. 133). Tasks included the estimation of truth of several universal and existential statements (some of them true, some of them false) from algebra and geometry, the evaluation of arguments, and questions regarding the existence of objects with certain properties. Responses were coded according to the REP framework such that alignment with conventional mathematics was considered. They found three types of inconsistencies in students’ responses: “(1) inconsistency with respect to the type of example [see Table 3.1], (2) inconsistency with respect to the type of statement [universal or existential], and (3) inconsistency with respect to the type of inference” (p. 139); for instance, regarding the “status of supporting examples [i.e., empirical arguments] in proving universal statements”. The number of observations where students were aware that examples do not prove a universal statement was equal to the number of observations where students had the contrary belief (16 observations each). Regarding the status of counterexamples in disproving universal statements, most students responded correctly (62 observations); the researchers found only 9 observations in which students though that a counterexample does not disprove a universal statement. Thus, while many participants seemed to have not fully grasped the generality of proof, most participants in this study seemed to have a correct understanding of the generality of universal statements. Interview studies with 16 secondary school students conducted by Stylianides and Al-Murani (2010) revealed similar findings. The students were selected based on their responses to an earlier survey, in which they seemed to have the belief that a proof and a counterexample can both exist to the same universal statement (as they assumed that both, a proof and a counterexamples to the same statement would get high marks from the teacher). However, Stylianides and Al-Murani found that all of these students correctly believed that a counterexample to a proven statement cannot exist.

In summary, several studies have investigated students’ or teachers’ understanding of the generality of proof and some of these considered the understanding of the generality of statements. However, all of these studies related the understanding of the generality of statements to that of proof. Moreover, what constitutes understanding of the generality of mathematical statements has not been explicitly defined so far and it is not clear how it relates to the understanding of (the generality of) proof as well as the usage and conviction of (counter-)examples. In particular, the existing research does not provide clear findings regarding the proportion of students and teachers with a limited understanding of the generality of statements (i.e., those who respond inconsistently regarding the estimation of truth of a universal statement and the existence of counterexamples) and aspects (such as the truth value of the statement and the type of argument) that might influence their understanding of the generality of statements.

The following section first provides an introduction to different activities that are related to proof and proving. Then, research findings regarding relevant activities with respect to the present project are discussed.

3.2 Activities Related to Proof

Research on students’ proof skills and understanding of proof and argumentation focuses on different activities. Mejía Ramos and Inglis (2009b) proposed a framework to classify the respective activities. Based on a general classification of mathematical activities provided by Giaquinto (2005)—making, presenting, taking in—Mejía Ramos and Inglis distinguish three main argumentative activities: “constructing a novel argument, presenting an available argument, and reading a given argument [emphasis added]” (Mejía Ramos & Inglis, 2009b, p. 68), which other researchers often refer to as proof construction, proof presentation, and proof reading (e.g., Selden & Selden, 2017). In order to consider different behavior due to different contexts, they further subdivide the three activities. Based on the work of de Villiers (1990) on different goals of proof, Mejía Ramos and Inglis (2009b) propose the classification of argumentative activities as shown in Figure 3.1.

Figure 3.1
figure 1

Sub-activities related to proof by Mejía Ramos and Inglis (2009b)

According to their framework, proof reading comprises proof comprehension and proof evaluation, depending on the goal with which a proof is being read. The goal of proof comprehension is the understanding of a given argument. Aspects of proof comprehension include knowing the meaning of definitions and terms, understanding the meaning and logical status of statements within the proof, being able to summarize main ideas of the proof, and illustrating the proof with examples (Mejía Ramos et al., 2012; Neuhaus-Eckhardt, 2022; Yang & Lin, 2008) (see further section 3.2.4). In the framework of Mejía Ramos and Inglis (2009b), proof evaluation consists of both, the validation of proofs (i.e., determining its correctness) and other evaluative tasks, such as assessing if a proof is convincing or explanatory. Similarly, Pfeiffer (2011) describes proof evaluation as “determining whether a proof is correct ... and also how good it is regarding a wider range of features such as clarity, context, sufficiency without excess, insight, convincingness or enhancement of understanding” (p. 5). To distinguish between proof validation and proof evaluation, A. Selden and Selden (2017) separate validation tasks from proof evaluation. Regarding proof evaluation, they put a focus on the judgement of qualitative aspects such as convincingness, clarity, context, and aesthetics (see also Inglis & Aberdein, 2015). Similar to proof reading, Mejía Ramos and Inglis (2009b) further subdivide proof presentation into sub-activities with different goals. All four of these sub-activities have in common that an argument is presented to an audience, but they differ with respect to different functions of proof (e.g., the argument is presented to convince the audience of the truth of a statement or to provide an explanation why a statement is true). In the understanding of Mejía Ramos and Inglis (2009b), proof construction is not only about finding and giving arguments to justify a statement, i.e., justification (see also section 2.4.1); their framework equally includes problem exploration and the estimation of truth of a statement as important aspects of proof construction.

Particularly relevant for the present thesis are the two sub-activities proof comprehension and proof evaluation regarding the reading of given arguments and the sub-activities estimation of truth and justification regarding the construction of (novel) arguments. I therefore do not discuss presenting available arguments further. Like Mariotti (2006), I assume that it is not possible to “isolate proof from the statement to which it provides support, and from the theoretical frame within which this support makes sense” (p. 184).

Therefore, to highlight the importance of the statement itself, I propose an adapted version of the framework of Mejía Ramos and Inglis (2009b) by adding the activity reading a statement (see Fig. 3.2). The presentation of arguments could potentially also be included in the adapted framework, however, it seems to be more difficult to directly relate it to the other activities. In the adapted framework, the sub-activity estimation of truth can be viewed as part of reading a statement of which the truth value is initially unknownFootnote 1. The conclusion of whether or not a statement is true can either be drawn by constructing an argument oneself or by reading a given argument (although on most occasions where an argument is provided, the reader already knows that the statement is true). Also, a deeper comprehension of a statement can be supported by both of these activities. Similar to proof comprehension, the comprehension of a statement contains aspects such as knowing and understanding the meaning of definitions, terms, and symbols, its logical structure, and its relation to other statements, as well as understanding the generality of the statement, i.e., understanding that there cannot be any counterexamples if the statement is universal and true (see also section 3.2.1). The comprehension of a given argument is therefore closely related to the comprehension of the statement itself. Even more: Without fully understanding the statement (including relevant definitions and terms etc.), it seems to be impossible to fully understand or construct an argument. Furthermore, the necessity for and generality of proof might only become clear when completely understanding the generality of the statement.

Figure 3.2
figure 2

Adapted framework on proof-related activities based on Mejía Ramos and Inglis (2009b)

Before discussing main research findings related to the respective activities, quantitative results from the literature reviews of Mejía Ramos and Inglis (2009a) and Sommerhoff et al. (2015) regarding argumentative activities (Sommerhoff et al. use the term situations) are summarized. Both studies found that the majority of mathematics education literature is on proof construction, followed by proof reading, and (almost) no literature was found on proof presentation (see Table 3.2 for a comparing overview of the studies and the respective results). The articles in the bibliographical study of Mejía Ramos & Inglis were further allocated to argumentative sub-activities (as shown in Figure 3.1). Regarding proof construction, most articles (54%) were about problem exploration, followed by the “justification of a statement” (27%); the minority of the articles (20%) was on the “estimation of the truth of a conjecture”. Articles on proof reading were mostly on the evaluation of the argument (87.5%). Only 12.5% were on the comprehension of arguments. One reason for that might be that it is more difficult to develop assessment models for proof comprehension than, for instance, proof construction, because one first has to define what proof comprehension consists of (see further section 3.2.4 on recent developments). Mejía Ramos and Inglis (2009a) argue that further research on proof comprehension and proof presentation is needed, because both of these (sub-)activities have received little attention in mathematics education so far, even though they can be viewed as two key argumentative activities.

Table 3.2 Comparing overview of descriptive literature reviews on argumentative activities

In the following sections, research findings related to the comprehension of statements, the estimation of the truth of statements, the comprehension of arguments, the evaluation of arguments (particularly with respect to convincingness), and justification (with a focus on different types of arguments being used, i.e., so-called proof schemes) are reviewed with respect to the present research interest. Research on problem exploration is not considered, as it was not explicitly investigated in the present thesis. Lastly, relations between these activities are discussed.

3.2.1 Comprehension of Statements

In literature on proof and proving, little attention has explicitly been given to the comprehension of statements themselves. Often, the understanding of the statement is seen as part of proof comprehension (see section 3.2.4), which in the framework of Mejía Ramos and Inglis (2009b) belongs to proof reading. As discussed in the previous section, in the present thesis, the comprehension of a statement is understood as (an integral) part of reading a statement which is related to both, the reading, and in particular the comprehension of proof, as well as proof construction. In section 3.1, research findings on the understanding of the generality of statements, which can be seen as part of the comprehension of statements, have already been discussed in detail. Thus, in this section, main findings on students’ comprehension of statements with respect to the statements’ content and logical structure are summarized.

Regarding the comprehension of mathematical statements, students seem to have difficulties with both, the logical structure of the statement (e.g. Dubinsky & Yiparaki, 2000; Moore, 1994; A. Selden, Mckee, & Selden 2010; J. Selden & Selden, 1995) as well as (basic) understanding of the content and relating it to a mathematical theory (e.g., Dubinsky & Yiparaki, 2000; Ferrari, 2002). In particular, students often seem to ignore unpacking the statement’s conclusion, but solely focus on the assumption, which limits their understanding of the statement and their attempts to prove it (Moore, 1994; A. Selden et al., 2010). Correctly unpacking the logical structure of a statement is seen as an important activity for understanding the statement and a necessity to construct and evaluate proofs (e.g., J. Selden & Selden, 1995). By unpacking the logical structure of an informal statement J. Selden and Selden (1995) mean “associating with it a logically equivalent formal statement” (p. 128). J. Selden and Selden (1995) examined 61 US university students’ abilities to understand and use the logical structure of mathematical statements in proof construction and validation by providing them with four informal calculus statements (two true, two false) which the participants had to unpack. They report that no student was able to unpack all informal statements into equivalent formal ones in first-order logic “which they were familiar with” (p. 139). The percentage of students who gave a correct logical unpacking of a statement was between 0 and 20.8%, depending on the statement. But students not only seem to struggle with unpacking statements into (more) formal ones, but also with the translation of terms used in informal statements into (simple) symbolic expressions, as Piatek-Jimenez (2004) reports. She illustrates this observation with the following example: One student in her study “converted her memorized definition of m being odd [the author probably means even] from ‘m is an integer divisible by 2’ into the symbols ‘\(m=\frac{n}{2}\)’ ” (p. 195). If a statement is given informally and a respective proof uses symbolic expressions, students having these kinds of difficulties would most likely struggle to follow the proof (as well as constructing one).

Dubinsky and Yiparaki (2000) also report on students’ difficulties with the logical structure of statements. They conducted a study with 63 students from two US universities and one liberal arts college (mainly mathematics and mathematics education majors) in which participants had to estimate the truth value of eleven statements, nine of them in everyday context (e.g., every pot has a cover) and two of them mathematical statements and justify their decision. The authors of the study were in particular interested in students’ understanding of universal existential statements and existential universal statements (see also section 2.1). The two mathematical statements were therefore chosen such that they basically only differed in the order of quantifiersFootnote 2. Dubinsky and Yiparaki (2000) found that about 42% of the participants incorrectly assumed that the two statements were equivalent. This finding indicates that students have difficulties with the interpretation of quantified statements, which was confirmed by the study conducted by Piatek-Jimenez (2004). Dubinsky and Yiparaki (2000) furthermore found that students mainly focused on the particular content of the statement and how that content relates to their experienced reality as they “referred to a world they were already familiar with and they considered that the statement described that world” (p. 53). The students were not able to understand and unpack the (logical) structure of the statements. Moreover, they had more difficulties with correctly interpreting and justifying statements when the context was mathematical, which the authors see as an indicator for students’ difficulties on a semantic level.

Fundamental difficulties were reported by Ferrari (2002), who conducted a study with 39 Italian first-year computer science students. He identified several difficulties regarding the participants’ comprehension of elementary number theory statements with respect to the content and language of the statement. For instance, students (1) had poor understanding of basic definitions; (2) lacked conceptual understanding of the content (e.g., division and fractions), as they used simplifications in a way that overemphasized procedural aspects; and (3) had difficulties with distinguishing between a number and its representation.

The findings reported above highlight the importance of (basic) content knowledge, including knowledge about (school) mathematical concepts, which also have been identified as essential prerequisites for a successful transition from school to universities (e.g., Rach & Ufer, 2020). Further, many, if not most students also seem to have difficulties with understanding and unpacking the logical structure of statements. These findings are not only relevant for the teaching and learning of proof and argumentation, but should be considered in the development of research instruments that aim to assess students’ proof skills (see section 5.3).

Fully comprehending (the meaning of) a mathematical statement is not only necessary for proving it or understanding a given proof, but it is also important for getting a better intuition regarding its truth value. The following section summarizes research findings regarding students’ success in estimating the truth value of statements as well as their respective strategies.

3.2.2 Estimation of Truth of Statements

Deciding wether or not a statement is true can be seen as an essential activity in mathematics. Consequently, many curricula suggest that students should be able to make, evaluate (e.g., estimating the truth value), and justify mathematical statements (e.g., Department of Basic Education, 2011; Kultusministerkonferenz, 2012; National Council of Teachers of Mathematics, 2000). For instance, the German national curriculum for upper secondary schools states that “this competence [referring to argumentation] includes ... the understanding and evaluation of given mathematical statements” (Kultusministerkonferenz, 2012, p. 14, translated by the author). Similarly, from prekindergarten through grade 12, students in the US should be enabled to “make and investigate mathematical conjectures” (National Council of Teachers of Mathematics, 2000, p. 56).

Of all articles on proof and argumentation that were included in the literature review of Mejía Ramos and Inglis (2009a), approximately 12% were about the estimation of the truth of a conjecture. Studies have reported a wide range of percentages of participants correctly judging the statements’ truth values, ranging from about 30 to 100% (Barkai et al., 2002; Dubinsky & Yiparaki, 2000; Hoyles & Küchemann, 2022; Ko, 2011; Ko & Knuth, 2009; Riley, 2003; Zeybek Simsek, 2021). In addition to investigating students’ or teachers’ success in estimating the truth value of mathematical statements, some studies have examined its relation to the type of statement (e.g., universal or existential, true or false) and/or strategies students or teachers use to come to a respective conclusion. For instance, Barkai et al. (2002) report on 27 elementary school teachers’ correct judgements regarding the truth of three universal and three existential statements, some of which are true and some false. The percentage of teachers who correctly estimated the truth value was between 68 and 100%, thus, most were comparatively successful in estimating the truth (see Tab. 3.3). Descriptively, it seems that it was more difficult for them to correctly decide on the truth value of existential statements (the true existential statement #4 is the same as #1 and therefore true for all n, thus it can be seen as an exception), in particular, regarding statements that are true for some n, no matter if the statement is expressed as false universal or true existential. However, the number of participants as well as items is comparatively small and might therefore not be representative.

Table 3.3 Teachers’ estimation of truth by truth value and domain of discourse of statements; findings reported by Barkai et al., table adapted from (Reid & Knipping, 2010, p. 70), with permission from Brill

Comparatively high success rates were also reported by Ko (2011). She conducted semi-structured interviews with eight secondary mathematics education majors from a US university, who were either in their third-year, fourth-year, or fifths-year. The majority of participants (about 83%) correctly estimated the truth value of six statements (four of them true, two of them false; from different content areas). However, one of the false statements was correctly evaluated by only half of them (the other one by 87.5%). Moreover, most of the students (successfully) used mixed reasoning strategies in which “individuals both use examples to identify relevant patterns and structures, and manipulate (partially) correct properties, definitions, and/or theorems to identify a reasonable example to attempt to prove or disprove the statement” (p. 481), fewer participants used other strategies such as deductive arguments (the author refers to sophisticated reasoning) or purely empirical arguments. A false statement that has been used in several studies is that if the perimeter of a rectangle increases, the area of it also increases. The percentage of preservice teachers who incorrectly evaluated this statement to be true has been reported to be comparatively high: About 57% of 23 preservice secondary school teachers from a US university and 72% of 50 preservice middle school teachers from a Turkish university that participated in a study conducted by Riley (2003) and Zeybek Simsek (2021), respectively, thought the statement is correct. Some researchers argue that it is no surprise that students particularly struggle with correctly estimating the truth value of false statements, because in schools (and universities), students usually have to prove true statements in contrast to disproving false ones (e.g., Buchbinder & Zaslavsky, 2007; Ko, 2011). However, no noteworthy difference was found by Ko and Knuth (2009) regarding the success rates of 36 Taiwanese mathematics undergraduates (most of them prospective secondary school teachers). For both a true and a false statement, the percentage of students who failed at estimating the truth value was about 20%. Most of these students provided an incorrect counterexample regarding the true statement and an incorrect proof regarding the false one. More research is needed to better understand the influence of the truth value on students’ success in estimating the truth value of statements.

To better understand students’ usage and understanding of (counter-)examples in the process of estimating the truth value of a statement, some researchers have investigated how students estimate the truth or falsity of statements (see also section 3.2.5 for further research on types of arguments being used by students and teachers to justify statements). In this regard, Buchbinder and Zaslavsky (2007) conducted a study to identify students’ strategies in determining the truth value of statements. They found that the first step “was based on their intuition and sense of confidence” (p. 563). Depending on their confidence regarding the truth value of the statement, students searched for evidence (either by deduction or based on empirical arguments) or directly gave empirical arguments to support their assertion. Empirical argumentations thereby occasionally resulted in “students shift to a different decision”, for instance, when they “have found by chance a counterexample that contradicted [their] decision” (p. 564). Thus, as has been emphasized by other researchers, experimentation is not only common but potentially useful to explore conjectures (e.g., de Villiers, 2010 Lockwood, Ellis, & Lynch, 2016).

To estimate the degree to which mathematicians use examples to evaluate and prove universal statements, Alcock and Inglis (2008) conducted two case studies with doctoral mathematics students. The participants were asked to decide if several statements were true and to justify their decision with a proof. Both doctoral students used empirical arguments in the interviews, however, “the degrees to which they invoked examples to support their reasoning were strikingly different” (p. 126). While one participant did not seem to use empirical arguments to get a better understanding of the statement, the other one did. Alcock and Inglis (2008) conclude that these findings highlight the usefulness of skills involving experimentation for the exploration of statements.

In summary, the success rates of students’ and teachers’ evaluation of the truth value of mathematical statements differ substantially and seem to depend on characteristics of the statement such as its actual truth value and its domain of discourse (if it is true for all, for some, or for no entities/cases). However, more research is needed to identify specific relations, for instance, regarding the influence of the statements’ truth value. Moreover, example based reasoning (or a mixed strategy including some deductive arguments) seems to be a common and useful approach–even though to different degrees–to get an understanding of the statement and a better intuition regarding its truth value. But if and how the usage of empirical arguments is related to students’ success in estimating the truth value of statements and the understanding of the generality of statements is unclear and needs to be investigated.

In the following section, research findings regarding students’ proof evaluation are summarized. Thereby, a particular focus is on students’ conviction and acceptance regarding different types of arguments.

3.2.3 Proof Evaluation

The evaluation of given arguments is another essential sub-activity of proof reading. As already stated in section 3.2, proof evaluation can include different aspects, such as the validation of arguments, i.e., deciding whether an argument is correct (i.e., a valid proof) and other evaluative activities, for instance, assessing if an argument is convincing or explanatory. Being able to decide if an argument is valid or not is seen as an important skill, for both students and teachers, and a necessity for establishing a proper understanding of proof (Pfeiffer, 2011; Powers, Craviotto, & Grassl, 2010; A. Selden & Selden, 2003; Sporn, Sommerhoff, & Heinze, 2021; Weber, 2010). In particular, researchers have argued that evaluative activities in general may be beneficial for the learning of proof construction, because both activities rely on the knowledge of acceptance criteria for proof (e.g., Pfeiffer, 2011; A. Selden & Selden, 2003). The focus of this section is on students’ proof evaluation regarding conviction and validity of arguments, but not on other aspects, such as how explanatory an argument is. Before empirical research on proof evaluation is reviewed, the relation between conviction and validity is discussed.

3.2.3.1 Conviction and Validity

Even though some researchers define the evaluation regarding the validity of arguments as a separate activity, namely that of proof validation (e.g., A. Selden & Selden, 2017, see also section 3.2), proof validation and other evaluative activities are generally not independent of each other. The degree of conviction is influenced by several aspects, for instance, by the perceived validity of the argument (see section 2.3.3). If someone identifies an argument as not valid, they will likely also find it not (completely) convincing. On the other hand, the sole existence and acceptance of a proof does not necessarily lead to conviction (Fischbein, 1982; Segal, 1999; Weber, 2010). For instance, in a study conducted by Fischbein (1982), many of the participating high school students felt the need to verify the truth of a statement through empirical investigations, even though they had claimed that the provided argument was a valid proof.

It is not always clear what is meant by conviction and how students interpret questions regarding their conviction of arguments. Conviction is sometimes used regarding the validity of proofs (e.g., Weber & Mejia-Ramos, 2015), often to express different degrees of conviction in the validity of proofs (see also discussion on relative and absolute conviction further below). However, likely more often, conviction relates to the truth of a statement (see, e.g., Segal, 1999). In this sense, questions regarding students’ conviction aim at investigating if the reading (or construction) of (certain types of) arguments lead to the conviction of the truth of a statement. As mentioned above, several aspects may influence students’ conviction of the truth of a statement, in particular, the perception of the argument as proof.

However, one can gain high levels of conviction for the truth of a statement by reading an argument without accepting the argument as a proof (Tall, 1989; Weber, 2010) or even without the existence of a proof, as de Villiers (1990) highlights: “Proof is not necessarily a prerequisite for conviction—to the contrary, conviction is probably far more frequently a prerequisite for the finding of a proof” (p. 18). This observation had also been made by Polya (1954), who argues that “without ... confidence [in the truth of the theorem] we would have scarcely found the courage to undertake the proof which did not look at all a routine job.” (pp. 83–84).

As discussed in section 2.3, proof–in the social sense–is not a clearly defined concept, but depends on socio-mathematical norms. Thus, the validation of an argument depends on the respective mathematical community. To highlight the different psychological aspects that are involved in proof validation and conviction, Segal (1999) distinguishes between personal conviction (convincing oneself) and public validation (persuading others). She found that first year mathematics students from a UK university showed personal conviction regarding empirical arguments but no public validation, indicating that they assume empirical arguments do not meet the requirements of proof. Regarding ordinary proofs, however, no such difference was found. The students either found these types of arguments convincing and thought they were proofs or neither of those two. To highlight different degrees of conviction, Weber and Mejia-Ramos (2015) have introduced the terms relative and absolute conviction, to which I come back to further below.

Due to their relevance for the present thesis, the following sections outline research findings regarding students’ evaluation of different types of arguments: empirical arguments and generic and ordinary proofs.

3.2.3.2 Findings on the Evaluation of Empirical Arguments

Healy and Hoyles (2000) found evidence that students are often convinced by empirical arguments, but at the same time recognize their limitations and thus, do not accept those arguments as valid proofs. Several studies have confirmed that most students seem to be aware of the limitations of empirical arguments (e.g., Healy & Hoyles, 2000; Lesseig et al., 2019; Stylianou et al., 2015; Tabach, Levenson, et al., 2010). Combined, these findings are in line with those of Segal, indicating that distinctions between personal conviction and public validation may be useful. However, other studies found the contrary, namely, that some students (and teachers) seem not only be convinced by empirical arguments but judge them as being valid proofs (Gholamazad et al., 2004; Knuth, 2002; Martin & Harel, 1989). For instance, in a study with 101 preservice elementary teachers from a US university conducted by Martin and Harel (1989), more than half of the participants accepted inductive argumentsFootnote 3 as proof for both familiar and unfamiliar statements. The convincing power of empirical arguments is underlined by findings reported by Bieda and Lepak (2014). They conducted an interview study with 22 junior high school students from the US. The majority of the participants (15) chose an empirical argument over a general argument as being more convincing, mainly because they claimed that empirical arguments enhance their comprehension of the statement and provide more information. Four participants found the generic argument more convincing and two of them explicitly referred to a lack of generality of the empirical arguments.

Other studies did not find evidence that students find empirical arguments convincing. For instance, Weber (2010) conducted a study with 28 mathematics students from a US university, who had completed a transition-to-proof course. Most participants neither found the empirical arguments provided in the study convincing nor thought they constitute a proof. These findings were reproduced by D. Miller and CadwalladerOlsker (2020), who investigated 38 mathematics students from a US university, who were also enrolled in a transition-to-proof course. Thus, more advanced mathematics students do not seem to find empirical arguments convincing. Similar results were found by Ko and Knuth (2013), who investigated 55 middle school mathematics teachers’ proof evaluation, and by Sommerhoff and Ufer (2019), whose findings show that most of the participating high school and university students judged the empirical arguments as being no valid proofs. Even though about 70% of more than 650 German high school students, who participated in a study conducted by Ufer, Heinze, Kuntze, and Rudolph-Albert (2009), correctly judged empirical arguments as invalid, only about one third of them could explain why these argument do not meet the criteria for proof. The authors argue that the students seem to be familiar with the fact that empirical arguments are insufficient to prove a universal statement, however, not in a way that they were able to explain why.

3.2.3.3 Findings on the Evaluation of Generic Proofs

Similar to empirical arguments, research findings on students’ and teachers’ evaluation of generic proofs seem to be inconsistent. Some studies suggest that many students do find generic proofs (in particular diagrammatic argumentsFootnote 4) convincing and think they constitute a proof (Ko & Knuth, 2013; Weber, 2010). Even though not all participants of the study conducted by Martin and Harel (1989) accepted generic proofs (the authors refer to particular proofs), those who (correctly) accepted an ordinary proof for a statement also rated the acceptance of the respective generic proof highly. However, other studies have found the opposite. Tabach, Barkai, et al. (2010), for instance, conducted a study with 50 high school teachers and found that about half of the participants did not accept the provided generic proofs (the authors refer to verbal justifications) because of a perceived absence of generality. The (perceived) generality of an argument seems to be a criterion, teachers often consider when evaluating a proof (Ko & Knuth, 2013). Moreover, the mode of representation was another reason why participants in the study of Tabach, Barkai, et al. rejected correct generic proofs. Further evidence that teachers assume generic proofs do not meet the criteria for proof was provided by Lesseig et al. (2019). They found that the majority of secondary school teachers who participated in their study did not accept the presented atypical proofs (a generic proof and a visual argument). Further, some participants explicitly stated that these arguments did not convince them. The focus in both studies was on teachers’ acceptance of arguments. A study conducted by Kempen (2018) aimed at investigating preservice teachers’ evaluation regarding verification as well as conviction. He found that generic proofs received low ratings regarding verification, while the ordinary proof (the author refers to formal proof) received very high ratings. While ratings regarding students’ conviction were higher than for verification, the generic proofs still received lower ratings than the ordinary proofs. In comparison to empirical arguments, Kempen (2021) found that students gave generic proofs higher ratings regarding both familiar and unfamiliar statements, which may indicate that students “do not mix up the idea of generic proofs with purely empirical verifications” (p. 4).

3.2.3.4 Findings on the Evaluation of Ordinary Proofs

Most students (and teachers) find ordinary proofs convincing and accept them as proof (e.g., Knuth, 2002; Ko & Knuth, 2013; Martin & Harel, 1989; Ufer et al., 2009), regardless of the familiarity with the statement (see, e.g., Martin & Harel, 1989). However, Weber (2010) observed that some students in his study did not find ordinary proofs convincing, even though they accepted them as proof, in line with findings reported by Fischbein (1982), as discussed above. Interviews conducted with the respective participants indicate that one reason for these contradictory responses is that these students seem to not have fully understood the proof (but nevertheless stated the proof was valid). Further, as discussed in section 3.1, some students believe that all arguments–ordinary proofs as well as other types of arguments such as empirical ones–can only provide evidence for a statement, but are not able to guarantee its truth (Chazan, 1993).

Several studies suggest that students also evaluate invalid ordinary proofs as being convincing or think they constitute a proof (Knuth, 2002; Martin & Harel, 1989; A. Selden & Selden, 2003; Weber, 2010). This indicates that students (and teachers) tend to focus on surface features such as the use of mathematical symbols and algebraic manipulations instead of the content and logical structure of the argument (Harel & Sowder, 1998; Inglis & Alcock, 2012; Knuth, 2002; A. Selden & Selden, 2003), which researchers sometimes refer to as ritualistic aspects of proof (e.g., Martin & Harel, 1989). A focus on the form of an argument has also been reported by Ufer et al. (2009): Most of the German high school students (about 80%) correctly evaluated the ordinary proof (the authors refer to a “formally expressed ... solution”, p. 41, in German “formal dargestellte ... Lösung”), but less students (about 66 to 68%) did so regarding a correct narrative proof, suggesting that the (perceived) formality of the proof plays a role regarding its acceptance.

Moreover, Stylianou et al. (2015) found that students’ answers regarding proof evaluation and construction can be contradictory: The students in the study were asked which argument (out of four) is closest to one they would produce themselves. Most students either chose a (numeric) empirical argument, a narrative deductive proof, or a symbolic deductive proof (each was chosen by about one third of the participants). But when students were asked to construct proofs to the same statements, the majority of students constructed numeric empirical arguments (45 to 75%) or narrative deductive proofs (14 to 45%). The students’ answers regarding what arguments they would give when asked to justify a statement did not fully reflect what types of arguments they actually construct, but they were most likely influenced by what they thought would be accepted as proof. This highlights that students are often aware that empirical arguments do not meet the standards for proofs and that general, deductive arguments are necessary. However, they are often not able to produce these types of arguments, a finding which has repeatedly been reported before, for instance, by Healy and Hoyles (2000).

3.2.3.5 Aspects that Influence Students’ and Teachers’ Proof Evaluation

Several aspects that may influence students’ and teachers’ evaluation of arguments have already been mentioned above, for instance, the form or representation of an argument (e.g., A. Selden & Selden, 2003; Tabach, Barkai, et al., 2010; Ufer et al., 2009), the perceived generality (Bieda & Lepak, 2014; Ko & Knuth, 2013; Tabach, Barkai, et al., 2010), and the comprehension of the argument (Bieda & Lepak, 2014; Weber, 2010). This section provides a summary of aspects that have been identified. Only few studies have explicitly investigated which aspects influence students’ or teachers’ proof evaluationFootnote 5. Ko and Knuth (2013) have identified several characteristics that may influence teachers’ conviction and judgement of validity, including the ones just mentioned. Other aspects they identified include the clarity and explanatory power of the argument, the familiarity with the type of argument (the authors refer to similarity), or the usage of mathematical facts (e.g., definitions or theorems). They report that the 55 participating middle school teachers most often referred to the generality of the argument regarding their conviction and to the usage of algebraic rules or mathematical symbols regarding the validity of arguments. In part, the coding scheme for acceptance criteria proposed by Sommerhoff and Ufer (2019) contains similar aspects to those identified by Ko & Knuth, such as the usage of counterexamples, understanding the argument, and aesthetics. But other characteristics differ. In particular, Sommerhoff & Ufer considered the structure of the proof, the proof scheme, and the logical chain in their coding, which were proposed as students’ methodological knowledge by A. Heinze and Reiss (2003). Moreover, during their coding process, they identified additional categories, for instance, references to the argument (not) being a mathematical proof or the requirement of proofs being unambiguous. Sommerhoff and Ufer (2019) analyzed school and university students’ as well as active mathematicians’ justifications of why they think the purported proofs are valid or not using the coding scheme for acceptance criteria. They found that the proof structure, proof scheme, logical chain, and understanding seemed to be the most important acceptance criteria overall. They emphasize that understanding seemed to be particularly relevant for school and university students.

In the following section, I reflect on the research findings on students’ proof evaluation, for instance, regarding their alignment with mathematicians’ proof evaluation and differences in research approaches.

3.2.3.6 Reflection on Research Findings on Proof Evaluation

In the research findings outlined above, several aspects can be identified that may influence results on proof evaluation and therefore possibly limit comparability:

  • the different (mathematical) background of participants (e.g., some are preservice teachers, others mathematics majors; the age, etc.)

  • the different research foci and designs (e.g., conviction vs validity, phrasing of questions, etc.)

  • a different (and sometimes unclear) understanding of (the level of) conviction.

The first two aspects provide indications of potential influencing factors and conditions under which students might find particular types of arguments convincing or accept them as proof, and how conviction and validation might be connected. These should be considered for future investigations. To address the last aspect–a different understanding of conviction–Weber and Mejia-Ramos (2015) have introduced the notion of relative and absolute conviction: Relative conviction is thereby defined as a “subjective level of probability” regarding the truth of a claim that “exceeds a certain threshold”; absolute conviction, however, is “a stable psychological feeling of indubitability about that claim” (p. 16). Weber and Mejia-Ramos (2015) argue that mathematicians have absolute conviction in certain statements, mainly those that are well known for a long time, but also relative conviction regarding the truth of “more sophisticated claims” (p. 16) as well as the validity of proofs. As an example, they refer to Hales’s proof of Kepler’s Conjecture about sphere packing in the Euclidean space (a computer-assisted proof by exhaustion) that was published in 2005 in the Annals of Mathematics (Hales, 2005). According to the editor of the journal, “the reviewers were only 99% sure the proof was valid” (see Weber & Mejia-Ramos, 2015, p. 16). Even though the editor and reviewers only had (high) relative conviction and not absolute conviction in the validity of the proof, (part of the) proof got publishedFootnote 6. The concept of relative and absolute conviction is related to what Duval (1990) refers to as the epistemic value of a statement. That is, “a personal judgement of whether and how the proposition is believed. It can take on values such as opinion, belief, certainty, principle, hypothesis, etc.” (Reid & Knipping, 2010, p. 74). In theory, mathematical proof “has the function of changing the epistemic value of a statement, for example from conjecture to theorem” (p. 75), thus, leading to absolute conviction in the truth of the statement.

With respect to the assessment of findings on students’ proof evaluation, Weber & Mejia-Ramos (2015) argue that researchers should verify if students have relative or absolute conviction regarding the truth of statements and proof. For instance, students’ conviction of empirical arguments is not necessarily problematic, if these arguments only lead to relative conviction in the truth of the statement. Similarly, students having doubts about the truth of a statement “after reading or producing a proof of the statement” (p. 19) may also be appropriate, if students only have relative conviction in the validity of the proof. The distinction in relative and absolute conviction may therefore be useful in interpreting and comparing research findings of students’ proof evaluation, regarding both, conviction in the truth of statements and validity of proofs.

To assess students’ evaluation of proof (and other proof-related activities), many researchers in mathematics education refer to mathematicians’ conceptions of proof and their respective acceptance criteria as a benchmark (e.g., Dawkins & Weber, 2017; Harel & Sowder, 2007; Stylianides, 2007; Weber, 2013; Weber & Czocher, 2019). Thereby, proving practices in the mathematics classrooms are not expected “to be exact replicas of professional mathematical communities” (Weber & Czocher, 2019, p. 253), but general standards for the acceptance and understanding of proof should be consistent with those of the mathematical community (Dawkins & Weber, 2017; Harel & Sowder, 2007)Footnote 7. Respective acceptance criteria have already been discussed in section 2.3.3. In summary, based on Weber and Czocher (2019), mathematicians seem to

  • agree on the acceptance of typical arguments;

  • agree on the non-acceptance of invalid proof schemes, for instance, empirical arguments;

  • disagree on atypical arguments such as visual arguments and computer-assisted proofs.

Furthermore, being familiar with the form of reasoning and representation that is used in an argument can be seen as an important criterion for the acceptance of proof. As a conclusion, students should accept typical arguments they are familiar with–which implies that they need sufficient experience until they can be expected to accept such arguments–but not invalid proof schemes, in particular empirical arguments. Less obvious is the acceptance of atypical arguments, for instance, generic proofs: Should students accept those as proof? As there might not be a clear consensus among mathematicians regarding the validity of such arguments, it is difficult to give an absolute answer. It seems to be more useful in this regard, to let students explain why they accept an argument and assess if the reasoning is consistent with characteristics of proof as outlined in section 2.3.3. Moreover, as noted above, to decide wether an argument leads to (relative or absolute) conviction in the truth of the statement and wether it meets the criteria for proof (personal conviction vs public validation) might lead to different outcomes.

Even though there seems to be a consensus among mathematicians that empirical arguments should not be accepted as proof, a considerable percentage of them claim to find empirical arguments nevertheless convincing (under specific conditions), as Weber (2013) found. He conducted an experimental study in which 97 research-active mathematicians participated. The main findings of the study are that, firstly, mathematicians seem to find empirical argumentsFootnote 8 more convincing if the respective statement is about integers having some property than if it is about integers not having some property. And secondly, that the mathematical domain seems to influence the persuasiveness of arguments, as an empirical argument regarding a statement about modular congruence was more convincing for the participants than an empirical argument regarding a statement about generating primes. Furthermore, about 27% of the participating mathematicians claimed they have been convinced by empirical arguments at least once in their mathematical practice. Overall, the mean ratings of persuasiveness of the empirical arguments were not very high, but Weber ’s study nevertheless demonstrates that under certain conditions (some) mathematicians gain personal (relative) conviction by empirical arguments. Because students need “to understand under what condition this type of evidence [empirical proof] might be appropriate and informative” (Weber, 2013, p. 110), mathematicians’ proof evaluation should be considered, when judging students evaluation (in particular of empirical arguments) and teaching proof. Therefore, further research on the conditions under which particular types of arguments (such as empirical ones or atypical arguments, e.g., generic proofs) can provide high levels of conviction for mathematicians is needed.

In summary, research findings on students’ proof evaluation of different types of arguments is at least partially ambiguous. Most students (and teachers) seem to find ordinary proofs convincing and think they are valid–even when the proof is in fact incorrect. The degree to which students find generic and empirical arguments convincing and/or think they are proofs is less clear. More experienced students seem to find empirical arguments neither convincing nor think they are proofs. Overall, the perceived generality, the representation of the argument, students’ understanding of the argument, and their methodological knowledge seem to influence if and how convincing or valid students judge different types of arguments. More research is needed to investigate the degree to which students find different types of arguments convincing and what aspects influence their conviction. Distinguishing between relative and absolute conviction might not only lead to more consistent outcomes, it could also assess students’ actual conviction of (empirical) arguments more accurately. Further, even mathematicians do not always agree on the validity of arguments and sometimes evaluate arguments as convincing even though they do not fully meet the criteria for proof (e.g., empirical arguments). These findings should be taken into account regarding the assessment of students’ (and teachers’) evaluation of proof.

The following section provides an overview of research on students’ (and teachers’) proof comprehension. First, developments in the assessment of proof comprehension are discussed. Following this, empirical findings on students’ and teachers’ proof comprehension are outlined.

3.2.4 Proof Comprehension

So far, in comparison to proof construction, not many studies have particularly focussed on the reading comprehension of arguments and proofs (Mejía Ramos & Inglis, 2009a; Neuhaus-Eckhardt, 2022; Sommerhoff et al., 2015), as already noted in section 3.2. This is somewhat surprising, because proof comprehension can be viewed as one of the main activities in mathematics university courses. Furthermore, the goal of teaching proof and argumentation in school (and university) is not mainly to convince students of the truth of a statement, but to enable a deeper understanding (e.g., de Villiers, 1990; Hanna, 1990; Hersh, 1993).

The assessment of proof comprehension in school und university is mostly carried out by asking students to reproduce a proof or to adjust it to a different context, although researchers argue that this does not provide sufficient inside in students’ actual proof comprehension, because correctly reproducing a proof can be achieved by solely memorizing it and does not necessarily require understanding (e.g., Conradie & Frith, 2000; Weber, 2012). Therefore, researchers have defined different aspects that may indicate proof comprehension (e.g., Bürger, 1979; Conradie & Frith, 2000; Kunimune et al., 2009; Pracht, 1979). Commonly, proof (reading) comprehension is thereby understood as the understanding of a particular (and valid) proofFootnote 9. However, even though lists of relevant aspects have been collected, Mejía Ramos et al. (2012) point out that “what it means for a proof to be understood, and how we can tell if students comprehend a given proof remain open questions in mathematics education” (p. 4). Thus, to measure students’ proof comprehension more systematically, researchers have recently begun to create assessment models, which are discussed in the following section.

3.2.4.1 Assessment of Proof Comprehension

Particularly noteworthy among developments in operationalising the assessment of students’ proof comprehension are the works of Conradie and Frith (2000), Yang and Lin (2008), and Mejía Ramos et al. (2012). Conradie and Frith (2000) were among the first researchers, who emphasized the importance of measuring proof comprehension at university level and explicitly suggested proof comprehension testsFootnote 10. They constructed two proof comprehension tests, which were used in a final exam for second-year university students at the University of Cape Town to measure students’ understanding of two particular proofs. Apart from providing specific test questions, Conradie and Frith (2000) summarize the following aspects of proof comprehension that could be tested individually: “understanding of specific steps ..., understanding of the structure of the proof ..., understanding of concepts used in the proof ..., understanding of assumptions and conclusions ... and understanding of some of the more subtle aspects of a proof” (p. 231).

The model of reading comprehension of geometry proof (RCGP) proposed by Yang and Lin (2008) for the learning of proof in secondary schools was the first research based assessment model that aimed at defining and structuring relevant aspects of proof comprehension. It consists of five facets (basic knowledge, logical status, summarization, generality, application) which can be allocated between four different levels of understanding (surface, recognizing elements, chaining elements, encapsulation). For instance, the comprehension of generality is placed between the third and last level. In contrast to other researchers (see section 3.1), Yang and Lin (2008) define understanding of generality as understanding “what is really proved by this proof” (p. 70). In describing their model, Yang and Lin (2008) concentrated on the first three levels and did not further specify the highest level of encapsulation, explicitly stating that the RCGP model “is not aimed at diagnosing if a student has reached this top level” (p. 71). Furthermore, their model was particularly designed to measure geometry proof comprehension at secondary level. Therefore, Mejía Ramos et al. (2012) adapted the RCGP model of Yang & Lin to better fit the requirements at tertiary level. In particular, the model of Mejía Ramos et al. (2012) is supposed to expand on the highest level of the RCGP model, at which students need to understand the proof as a whole, for instance, understanding the main idea of the proof. To identify and justify relevant aspects of proof comprehension, they reviewed literature on different goals and methods of proof discussed in mathematics education and conducted interviews with mathematicians regarding their conceptions of proof comprehension. Furthermore, they drew from the proof comprehension questions suggested by Conradie and Frith (2000). Mejía Ramos et al. (2012) identified seven types of questions that each “measures a different facet of proof comprehension” (p. 5). They subdivided these facets into two groups: local and holistic aspects of proof comprehension. Local understanding thereby means the understanding of a particular statement within the given proof and its (logical) connection to other specific statements in the proof. Understanding the meaning of terms and statements (within the proof) is part of a local understanding, for example. In contrast, to gain holistic understanding, one has to understand the proof as a whole or at least its main parts, for instance, one has to be able to summarize the main idea of the proof or transfer the proof’s methods to another context. Thus, holistic understanding relates to the highest level of the RCGP model, encapsulation. Mejía Ramos et al. (2012) specify three types of questions that relate to local understanding and four that relate to holistic understanding. The three local aspects can be summarized as

  • understanding the meaning of terms and statements

  • understanding the logical status of statements and proof framework

  • being able to provide justifications of claims.

The four aspects of holistic understanding, that Mejía Ramos et al. have identified, were particularly valued by the interviewed mathematicians. These types of questions consist of:

  • summarizing main ideas of the proof

  • identifying the modular structure

  • transferring the general idea or method to another context

  • illustrating (parts of) the proof with examples.

Even though Mejía Ramos et al. (2012) state that they do not view their model to be hierarchical, they do not rule out the possibility that relationships between facets exist, for example, “being able to summarize the proof ... may be necessary in order to successfully transfer [the] ideas and methods to another context” (p. 16). The assessment model of Mejía Ramos et al. was used to design three reliable multiple-choice tests that “validly measure students’ comprehension of the proofs that they read” (Mejía Ramos, Lew, Torre, & Weber, 2017, p. 140). Thus, they have demonstrated a useful alternative method of measuring students’ proof comprehension compared to asking students to simply reproduce a proof, for example.

Another recent work on proof comprehension worth mentioning is the doctoral thesis of Neuhaus-Eckhardt (2022). She defines proof comprehension as the construction of a mental model of a valid proof in written form through processes of text comprehension (Neuhaus-Eckhardt, 2022, p. 36). Based on the assessment model of Mejía Ramos et al. (2012) and a literature review on proof comprehension, Neuhaus-Eckhardt proposes a list of aspects that indicate proof comprehension. She expanded the model of Mejía Ramos et al. (2012) by introducing a third group, namely, aspects of proof comprehension beyond the particular proof (in German “über den Beweis hinausgehende Aspekte”, pp. 49–50). While Mejía Ramos et al. (2012) include aspects such as transferring to another context in the group of holistic understanding, Neuhaus-Eckhardt (2022) argues that such questions do not refer to the particular proof but to the underlying ideas and methods used in the proof. Thus, students’ ability to transfer the idea of the proof to another context, for instance, indicates an understanding beyond the particular proof.

In the following section, I summarize empirical research findings on students’ proof comprehension.

3.2.4.2 Findings on Proof Comprehension

Since literature on proof reading comprehension is rare (Mejía Ramos et al., 2012; Neuhaus-Eckhardt, 2022), findings from studies on proof reading in general and students’ difficulties with proof are also considered in this section.

Studies have found that students tend to read proofs line by line–Inglis and Alcock (2012) refer to zooming in–focussing on local aspects (A. Selden & Selden, 2003), in contrast to mathematicians, who, as already mentioned above, claim to value a holistic understanding of proof (e.g., Mejía Ramos & Weber, 2014). Inglis and Alcock (2012) conducted an eye-tracking study, which confirmed the findings of A. Selden and Selden (2003) that students tend to focus on surface features of proof, such as notational and computational aspects, instead of the logical structure of the arguments. Moreover, the mathematicians in their study “made nearly 50% more between-line saccades than the undergraduates” (Inglis & Alcock, 2012, p. 380), suggesting that mathematicians tried more often to connect statements between lines on a local level. However, no evidence was found that mathematicians “engage in zooming out” (p. 380), meaning a non-sequential reading strategy to identify links between different parts of the proof on a holistic level. This can be seen as a contradiction to mathematicians’ self-report on proof reading, according to which mathematicians claim to first start with skimming the proof (Mejía Ramos & Weber, 2014; Weber, 2008). Thus, mathematicians’ behaviour may to be different from what they claim they do when attempting to comprehend or validate a proof.

Students’ focussing on local aspects, in particular surface features, might not be surprising, because many students’ seem to already lack basic knowledge. For instance, they have difficulties with knowing and understanding definitions, notations, and theorems (Conradie & Frith, 2000; Moore, 1994; Reiss & Heinze, 2000). In a study on high school students’ understanding of proofs, Reiss and Heinze (2000) found that only about 9% were able to correctly define the concept of congruence and only about 11% were able to state a theorem involving congruence.

Furthermore, many students do not seem to understand the logical status of statements and the purpose of specific statements used in the proof, for instance, they have difficulties distinguishing between assumption and definition (regarding proof by contradiction) or between assumption and conclusion (Conradie & Frith, 2000). As already discussed in section 3.2.1, unpacking the logical structure, in particular interpreting and understanding the meaning and order of universal and existential quantifiers, seems to be another difficulty students encounter (Dubinsky & Yiparaki, 2000; J. Selden & Selden, 1995) which can be seen as an obstacle regarding proof comprehension.

Even though researchers have argued that generic proofs can improve students’ proof comprehension by making the ideas more accessible to them (Dreyfus et al., 2012; Mason & Pimm, 1984; Rowland, 2001), not many empirical studies explicitly investigated the influence of different types of arguments on proof comprehension. Findings on proof comprehension of generic proofs in comparison to ordinary proofs are not consistent so far. In a qualitative study with ten first year engineering students from a university in Israel, Malek & Movshovitz-Hadar (2011) found that students, who were presented with generic proofs (or transparent pseudo proofs, as they call them, see section 2.4.2), performed better at proof comprehension than students, who received ordinary proofs. However, this was only the case for proofs involving methods with which the students were not familiar and that were based on ideas that could easily be transferred to another context. To provide more evidence for the influence of generic proofs on proof comprehension, Lew et al. (2020) employed an experimental quantitative study in which 106 mathematics students from universities in the United States and Canada participated. Students were randomly assigned to either receive a generic proof or an ordinary proof. All participants then had to complete a proof comprehension test based on the assessment model of Mejía Ramos et al. (2012). They did not find evidence that the generic proof lead to better proof comprehension than the ordinary proof. Similarly, Fuller, Weber, Mejia-Ramos, Rhoads, and Samkoff (2014) used the assessment model of Mejía Ramos et al. in a quantitative study with 300 mathematics students to investigate proof comprehension of so-called structured proofsFootnote 11 in comparison to ordinary proofs. They could not find consistent evidence that students generally perform better on proof comprehension tests when presented with structured proofs. Even if generic or structured proofs do not lead to better proof comprehension for mathematics university students–for which further evidence is needed–they could still potentially improve proof comprehension of high school students or at the transition from school to university.

In summary, comparatively few studies have systematically investigated students’ proof comprehension. Most students seem to focus on local aspects instead of holistic aspects. They often lack basic knowledge to comprehend particular statements or terms used in the proof and have difficulties with understanding the logical status of statements and unpacking the logical structure. These findings are not surprising, as similar difficulties have been reported regarding students’ comprehension of mathematical statements (see section 3.2.1). So far, the influence of different types of proofs, such as generic proofs on students’ proof comprehension is not clear. Recent experimental studies suggest that they may not improve proof comprehension in comparison to ordinary proofs. Overall, the findings discussed in this section highlight the need for further research on students’ proof comprehension, in particular with respect to different types of arguments.

The following section outlines frameworks and research findings regarding the justification of statements. The focus is thereby on students’ so-called proof schemes.

3.2.5 Justification

Students’ ability to construct proofs on their own is seen as a major learning goal in mathematics (Hanna, 2000; Harel & Sowder, 1998; Stylianou & Blanton, 2015; Weber, 2001). Consequently, as noted in section 3.2, most of the research on proof and proving is on students’ proof construction. Many studies have shown that students at all levels as well as (prospective) mathematics teachers have difficulties with proof construction (e.g., Barkai et al., 2002; Bell, 1976; Healy & Hoyles, 2000; Hemmi, 2008; Moore, 1994; Weber, 2001).

Several reasons for these difficulties have been discussed in the literature. These include cognitive challenges, for instance, due to the fact that proving is understood as a complex activity, which requires cognitive and other skills such as problem solving (e.g., Chinnappan, Ekanayake, & Brown, 2012; Moore, 1994; A. Selden & Selden, 2013; Sommerhoff et al., 2015; Stylianou et al., 2006; Ufer, Heinze, & Reiss, 2008; Weber, 2005). As with students’ proof comprehension, basic knowledge is central for success in proof construction (e.g., Bell, 1976; Chinnappan et al., 2012; Sommerhoff, 2017; Ufer et al., 2008). Further, affective and epistemological aspects such as a lack of intellectual need for proof and missing or inappropriate conceptions of proof also seem to play a crucial role (e.g., Harel & Sowder, 1998; Tall, 1989). The latter has been explained in the literature by the fact that in the teaching of proof, not enough emphasis is put on “gradually refining students’ conceptions of what constitutes evidence and justification in mathematics” (Harel & Sowder, 1998, p. 237).

In the following, the focus is on so-called proof schemes students (and teachers) demonstrate when asked to justify mathematical statements. These are described in the following section and different categories that have been identified in the literature are discussed. After that, findings on students’ proof schemes are reviewed.

3.2.5.1 Frameworks for Students’ Proof Schemes

Several studies have been conducted to identify different types of arguments school and university students use when asked to justify a mathematical statement (Balacheff, 1988b; Bell, 1976; Harel & Sowder, 1998; Recio & Godino, 2001). Reference is sometimes made to so-called proof schemes (Harel & Sowder, 1998; Lee, 2016; Recio & Godino, 2001) to separate these types of arguments from other distinctions, such as the distinction of proofs that prove and proofs that explain made by Hanna (1990) and classifications regarding the content and method of proof made by Usiskin (1980) (see also Harel & Sowder, 1998). Harel and Sowder (1998) emphasize that the notion of proof schemes should not be interpreted “in terms of mathematical proof in its conventional sense” (p. 275), but as arguments someone is convinced by or thinks others may find convincing. Investigating and understanding students’ proof schemes is perceived as useful among researchers to better understand students’ difficulties with proof (e.g., Balacheff, 1988b; Harel & Sowder, 1998).

An early study on students’ proof schemes was conducted by Bell in 1976 with 32 pupils aged 14–15 from a grammar and two comprehensive schools. He divided students’ responses to questions about the justification of statements into two main categories, empirical and deductive arguments, which both were further divided into several subcategories (see Tab. 3.4). Bell points out that the categories partly overlap, for instance, the first subcategories are both failures to provide an argument and the last subcategories are both valid proofs. Further, it should be noted that some of the statements used in the study are about finite sets, thus, those statements could validly be proven by checking all relevant cases. Therefore, not all subcategories proposed by Bell (1976) might be relevant or useful for categorizing students’ justifications of statements about infinite sets.

Table 3.4 Categories of students’ proof explanations identified by Bell (1976, pp. 18–19)
Table 3.5 Summary of students’ main proof schemes identified by Harel and Sowder (1998, p. 245)

Another well-known and often cited study, which aimed at identifying students’ usage of arguments, was conducted by Harel and Sowder (1998). Through an exploratory study that consisted of classroom observations, interviews, and students’ homework and tests, they identified three main categories of college students’ proof schemes: external conviction proof schemes, empirical proof schemes, and analytical proof schemes, each with several subcategories. Table 3.5 provides a summary of the main categories and subcategories identified in the study. The empirical proof scheme subcategory of inductive arguments relates to what other researchers, such as Bell (1976), usually refer to as empirical arguments. In contrast to Harel and Sowder (1998), Bell (1976) did not identify perceptual arguments in his study. Bell mainly made distinctions within a category regarding the degree of completeness and systematization. The analytical proof schemes identified by Harel and Sowder (1998) relate to Bell ’s deductive arguments as Harel & Sowder (1998) describe analytical proof schemes as those that “validate conjectures by means of logical deduction” (p. 258). However, unlike Bell (1976), who categorized students’ deductive arguments mainly by successfulness and degree of completeness of the argument, Harel and Sowder (1998) identified two types of deductive/analytical proof schemes, transformational (which include generic proofs, for example) and axiomatic proof schemes.

Recio and Godino (2001) also conducted a study on students’ proof schemes. They categorized first-year university students’ responses into the following five categories (which were identified in an earlier study, see Recio & Godino, 1996):

  1. 1.

    The answer is very deficient (confused, incoherent)

  2. 2.

    The student checks the proposition with examples, without serious mistakes.

  3. 3.

    The student checks the proposition with examples, and asserts its general validity.

  4. 4.

    The student justifies the validity of the proposition, by using other well-known theorems or propositions, by means of partially correct procedures.

  5. 5.

    The student gives a substantially correct proof, which includes an appropriate symbolization.

While Recio and Godino (1996; 2001) did not explicitly define upper categories, the five categories could be divided into either empirical or deductive arguments (the first category can be viewed as unclear, i.e., no clear type of argument/proof scheme can be identified; the second and third category can be seen as empirical arguments; the fourth and fifths category as deductive arguments), similar to those identified by Bell (1976) and Harel and Sowder (1998) (who used the term analytical, or, more specific axiomatic).

Other researchers have build upon these three systems of categories. Kempen (2019), for instance, grounded the development of categories on Bell (1976) and Recio and Godino (2001), thus focussing on empirical and deductive proof schemes. Thereby, he introduced the category pseudo argument, referring to arguments that are circular, redundant or simply incorrect (see p. 118). Lee (2016), on the other hand, based his categories essentially on the three main categories identified by Harel and Sowder (1998). However, Lee ’s levels are not always strictly divided by the main categories proposed by Harel & Sowder, such as empirical and (incomplete or false) deductive proof schemes. For example, students who based their justification on examples and those, who used incorrect logical reasoning, were allocated to the same level.

In summary, the three main categories of students’ proof schemes that have been identified by at least one of the studies discussed above are external proof schemes, empirical proof schemes, and deductive or analytical proof schemes. A further distinction regarding the completeness of deductive arguments as well as other aspects, such as the subtype of proof scheme as proposed by Harel and Sowder (1998), seems to be useful for analyzing students’ attempts to justify mathematical statements. In the following section, empirical findings on students’ usages of different types of arguments are summarized.

3.2.5.2 Findings on Students’ Proof Schemes

Several studies found that many school and first-year university students as well as teachers give empirical arguments when asked to justify a statement (Balacheff, 1988a; Barkai et al., 2002; Bell, 1976; Bieda, 2010; Healy & Hoyles, 2000; Housman & Porter, 2003; Lee, 2016; Recio & Godino, 2001; Sears, 2019; Sen & Guler, 2015; Stylianou et al., 2006). For instance, Recio and Godino (2001) report that about 40% of the 429 first-year university students, who participated in their study, gave empirical arguments to justify universal statements. Similarly, about half of the participants in a study conducted by Barkai et al. (2002) with 27 elementary school teachers used empirical arguments to prove a universal statement. In line with the categories proposed by Bell (1976), the nature of empirical arguments used by students (and teachers) seems to differ in that some of them only choose random examples while others search for patterns (Stylianou et al., 2006). In a study conducted by Housman & Porter (2003) with eleven above-average mathematics students, the majority of participants used perceptual arguments, only one student made also use of inductive arguments (both as defined by Harel & Sowder, 1998). But some students seem to even have difficulties to produce an empirical argument, as Bell (1976) found: About one forth of the participants were allocated to the respective first subcategory of empirical arguments (see Tab. 3.4). Those students were not able to generate correct examples, which Bell (1976) explains with a lack of knowledge and “an inability to coordinate all the data” (p. 34). Another 19% of the participants in Bell ’s study checked all relevant cases regarding a statement about a finite set, thus proving the correctness of the statement by exhaustion. However, none of the students gave a complete explanation for neither of the statements and only one student was able to give some (deductive) explanations, even though these were incomplete.

There is strong evidence that many students as well as teachers fail to construct valid deductive arguments (Barkai et al., 2002; Bell, 1976; Healy & Hoyles, 2000; Kempen, 2019; Lee, 2016; Recio & Godino, 2001; Sears, 2019; Sen & Guler, 2015; Sevimli, 2018; Stylianou et al., 2006). Most of the respective studies report that less than half of the participants justified true universal statements (and false existential statements) with a complete and correct proof (e.g., Barkai et al., 2002; Kempen, 2019; Lee, 2016; Recio & Godino, 2001). Findings suggest that many first-year university students thereby often seem to construct pseudo arguments (e.g., Kempen, 2019; Stylianides & Stylianides, 2009). For instance, Kempen (2019) reports that about 26% of the 149 preservice teachers who were asked to justify a statement about the sum of two even numbers used such incorrect deductive arguments; about 23% gave an incomplete deductive argument, about 20% constructed a valid proof, about 9% gave empirical arguments, 8% no justification at all, and about 13% seemingly did not answer. The percentages differed substantially regarding the semester. For instance, first-semester preservice teachers more often gave empirical arguments (about 14%) and less often incomplete or complete deductive arguments (about 9 and 10%). Although research findings are generally consistent regarding students’ difficulties with correct justifications, success at constructing proofs seems to depend on several factors such as the respective statement (universal vs existential, the truth value, mathematical context, etc.) as well as age (see also Reid & Knipping, 2010). For instance, the statements used in Barkai et al. (2002) consisted of three universal (one true, two false) and three existential statements (two true, one false), all in the context of divisibility. Depending on the statement, the percentages of teachers who gave correct justifications (either proving or disproving the statement) varied between about 23% (regarding the false existential statement) and 96% (regarding a true existential statement). The true universal statement could only be proven by about 40% of the participants; the success rates regarding the two false universal statements were significantly higher with 69 and 88%. In those cases, the statements could be disproven with providing just one counterexample, which is an easier task than producing a deductive argument that holds for an infinite number of cases. Similarly, proving a true existential statement only requires finding one example; for proving the falsity of an existential statement one needs to construct a general deductive argument.

Fewer studies have reported results on students use of external conviction proof schemes (Harel & Sowder, 1998; Sears, 2019; Sen & Guler, 2015; Sevimli, 2018; Stylianou et al., 2006). In a study conducted by Stylianou et al. (2006, p. 57) with 34 first-year mathematics students, about 20 to 35% (depending on the task) gave externally based arguments. These justifications were mainly based on symbolic manipulations or a redesign of the statement, but not on authority (e.g., a textbook or teacher). Sevimli (2018) also reports on students’ usages of external arguments. Most of the justifications given by his participants (172 first-semester students from three different mathematics departments in Turkey) belonged to the external proof scheme. In contrast to the findings reported by Stylianou et al., these students particularly made reference to authority. School students also often make reference to authority, as Sen and Guler (2015) found. They conducted a study with 250 7th Grade students from Central Anatolia. Most of the participants used either external or empirical arguments to justify mathematical statements. Thereby, the externally based arguments were mainly authoritarian and ritual (but in particular for one statement also symbolic). Sears (2019) reports on a (small-scale) study with similar results, but regarding the justification of statements given by preservice middle and secondary school teachers. The six participants mainly used external and empirical arguments, and reference was often made to authority. However, due to the small number of participants, these findings are not generalizable.

Several studies have investigated if students and teachers, who give empirical arguments to justify universal statements, are aware that these types of arguments do not meet the requirements of proof (Barkai et al., 2002; Stylianides & Stylianides, 2009). Findings suggest that many (if not most) participants are aware that a general argument is needed. For instance, in the study conducted by Barkai et al. (2002), about 20% of the 27 participating elementary school teachers stated that they know that a general proof is needed, but that they lack necessary knowledge to construct such an argument. Similarly, in a study conducted by Stylianides & Stylianides (2009) with 39 prospective elementary school teachers, most of the participants who submitted empirical arguments were aware that their arguments were not valid as proofs. Thus, as Weber and Mejia-Ramos (2015) pointed out, “behaviour on justifications tasks is [not] a sufficient warrant to establish this claim [referring to the claim that students are convinced by empirical arguments]” (p. 16). Not all students and teachers, who use empirical arguments to justify a universal statement, might do so because they think these are sufficient to prove a statement, but because they are simply not able to construct a valid proof. However, as discussed in section 3.2.3, some students and teachers do think empirical arguments are sufficient (see, e.g., Martin & Harel, 1989).

Overall, students as well as teachers have difficulties with the construction of ordinary proofs. Several aspects may influence students’ success to construct valid arguments, for instance, the truth value of the statement. Further, most students give empirical arguments when asked to justify the truth of a statement, potentially because of their inabilities to construct ordinary proofs and not because they assume that these arguments are sufficient. Some studies have reported on (high school) students’ external proof schemes such as authoritarian arguments. However, it is unclear what characteristics of the statements (e.g., truth value or familiarity) and students (e.g., age, experience,...) influence the usage of these, but also other types of arguments, which highlights the need for further research.

In the following section, potential relation between proof-related activities and respective research findings are discussed.

3.2.6 Relation Between Activities

As described in section 3.2, the three main activities reading a statement, reading an argument, and constructing an argument are related in that both reading and constructing an argument may support the comprehension of a statement as well as estimating its truth (see relations marked as A in Fig. 3.3). Vice versa, without some understanding of the statement, it is neither possible to decide if the statement is true or false nor to understand and evaluate a given argument or construct a novel one (the latter relations are marked as B in Fig. 3.3; see also discussions in sections 3.2.2 and 3.2.4, respectively).

Figure 3.3
figure 3

Adapted framework on proof-related activities based on Mejía Ramos and Inglis (2009b), highlighted relations

Further, it seems plausible that proof evaluation, in particular with respect to conviction, and proof comprehension are related: If students do not understand an argument, they may judge it as not convincing, which in turn could influence the estimation of truth of the statement (see also Weber & Mejia-Ramos, 2015).

There are only few studies that have investigated the relation between the different (sub-)activities, as has already been highlighted by other researchers (e.g., A. Selden & Selden, 2017; Sommerhoff, 2017). As has already been discussed in section 2.3.3, the acceptance of an argument as proof is highly influenced by the context and individuals who read or construct a proof, thus, socio-mathematical norms. The respective acceptance criteria can not only influence the evaluation of given arguments, but also the construction of novel ones. This assumption is supported by research which found correlations (even though weak) between proof validation and proof construction (Ufer et al., 2009). Further, studies suggest that engaging with proof validation activities may positively influence proof construction (Pfeiffer, 2011; Powers et al., 2010; A. Selden & Selden, 2003; Yee et al., 2018). Findings reported by Sommerhoff (2017) suggest that the correlation between proof validation and proof construction is not mainly based on methodological knowledge (e.g., knowledge about appropriate proof schemes, see A. Heinze & Reiss, 2003), but an effect of different resources that underlie both of these activities (see also following section).

Table 3.6 Teachers’ estimation of truth and correct justification by truth value and domain of discourse of statements; findings reported by Barkai et al., table adapted from (Reid & Knipping, 2010, p. 70), with permission from Brill

Of particular interest in the present thesis are relations between the comprehension of a statement–in particular its generality–and the estimation of truth on one hand and proof comprehension as well as justification on the other hand. Findings reported by Barkai et al. (2002) suggest a relation between the estimation of truth and justifications given by the participating teachers, in particular with respect to the truth value and domain of discourse of the statement (see Tab. 3.6, an extended version of Tab. 3.3). The first universal statement was correctly estimated as a true statement by all teachers, but only 41% constructed correct arguments to prove it. Most of the incorrect arguments were empirical (about 50%), which might imply that the empirical verifications convinced the teachers of the truth of the statement, even though they could not prove the statement (about a third of the teachers thought the empirical arguments count as proof). The truth value of the second statement (which was false) was also correctly estimated by all teachers, but significantly more teachers were able to provide a correct proof (most gave one or more counterexamples). In contrast, the third statements was correctly estimated as being false by fewer teachers (69%), however, all of these teachers were able to correctly justify their decision by providing one or more counterexamples. It seems that it was easier for the teachers do disprove a false universal statement (by providing a counterexample) than proving a true universal statement, at least for those who were able to correctly estimate the truth value. Similar observations can be made regarding the existential statements: Most of the teachers who correctly estimated the truth value of the true existential statements were able to provide a correct justification (by giving an example). But the false existential statement was proven by only 23% of the participants, most likely because a general argument was needed (see also Reid & Knipping, 2010, p. 70).

To the author’s knowledge, no studies on the relation between understanding the generality of a statement (as part of reading a statement, in particular, comprehension of a statement) and proof reading or proof construction have been conducted so far. In general, more research is needed to understand the interplay between the different (sub-)activities related to proof.

The following section aims to identify appropriate control variables regarding (cognitive) resources that underly students’ proof skills and potentially their understanding of the generality of statements.

3.3 Resources

In research on argumentation and proof skills, several resources underlying these skills have been identified and investigated (Chinnappan et al., 2012; Sommerhoff, 2017; Ufer et al., 2008; Weber, 2001). As was pointed out by Sommerhoff (2017), no general framework or list for these resources exist so far. This section aims at giving an overview of potential resources that might be relevant for argumentation and proof skills and could therefore be used as control variables in the analysis of students’ understanding of generality and other proof-related activities. Thereby, the focus is on cognitive resources rather than non-cognitive ones such as motivation and beliefsFootnote 12

The resources identified in the literature so far can mainly be categorized into content-specific, domain-specific, and domain-general resources (see Sommerhoff, 2017; Ufer et al., 2008). Different terms are sometimes used to refer to similar resources (e.g., mathematical knowledge base and mathematical content knowledge), here, the notation is taken from Sommerhoff (2017). Content-specific resources refer to knowledge that belongs “to a specific mathematical content area” (Sommerhoff, 2017, pp. 45–46). It contains conceptual (e.g., knowledge about concepts and definitions) as well as procedural knowledge (e.g., knowledge about rules and procedures). Domain-specific resources, such as mathematical strategic knowledge (first introduced by Weber, 2001) and methodological knowledge (see further above), are not specific to a particular mathematical content but belong “to the [general] field of mathematics” (Sommerhoff, 2017, p. 46). In contrast, domain-general resources are not specific to mathematics and include problem-solving skills and (general) reasoning skills (Sommerhoff specifically refers to conditional reasoning skills, i.e., reasoning skills needed to handle conditional statements).

According to the literature review conducted by Sommerhoff et al. (2015) (see also section 3.2), mathematical content knowledge was considered most often in PME research reports on proof and argumentation (about 47%). Methodological knowledge was studied by only 17% and problem-solving skills by 18%. Only few research reports investigated other resources such as mathematical strategic knowledge or beliefs (3–5%). Conditional or more general reasoning skills were seemingly not analyzed in any of the research reports considered in the literature review.

Overall, research findings suggest a strong impact of content- and domain-specific resources on activities related to proof and argumentation (Sommerhoff, 2017). In particular, mathematical content knowledge seems to be a main predictor for students’ performance in proof construction, as several studies have shown (Chinnappan et al., 2012; Sommerhoff, 2017; Ufer et al., 2008). Further, Weber (2001) reports that a lack of mathematical strategic knowledge is “a primary cause for undergraduates’ failure” (p. 115) in proof construction. The quantitative study conducted by Sommerhoff (2017) provides further evidence for the influence of mathematical strategic knowledge on students’ performance in proof construction and validation.

The importance of problem-solving skills for proof competencies and their relation has been highlighted by many researchers (e.g., Chinnappan et al., 2012; Selden & Selden, 2013; Stylianou et al., 2006; Ufer et al., 2008; Weber, 2005). However, its actual influence on students’ performance in activities related to proof is not clear yet. Several studies found significant correlations (Chinnappan et al., 2012; Ufer et al., 2008), while Sommerhoff (2017) reports a low and insignificant effect on students’ performance in proof construction. Sommerhoff assumes that mathematical strategic knowledge, which was included in the regression model, “reduces the impact of problem-solving skills, as mathematical strategic knowledge can partially be seen as a domain-specific analogue of problem-solving heuristic” (p. 89). However, he furthermore found that there seems to be a correlation between performance in proof validation and problem-solving skills. Further research is needed to better understand the impact of students’ problem-solving skills on proof performance.

Regarding mathematical reasoning skills, Chinnappan et al. (2012) report a significant influence on students’ success in proof construction. However, mathematical reasoning skills were measured using a geometry test conducted at the end of Grade 10. Even though this test requires deductive reasoning, the skills needed to solve the tasks seem to overlap with other resources such as mathematical content knowledge. According to findings reported by Sommerhoff (2017), conditional reasoning skills (measured by questions in which participants had to accept or reject logical inferences) seem to “play a minor role compared to the domain-specific resources” (p. 90).

To my knowledge, other more general cognitive measures have not been considered as potential resources underlying argumentation and proof skills so far. In section 5.3.6, the so-called Cognitive Reflection Test (CRT) is introduced as an instrument to control for individual differences in cognitive resources.