In recent years, research in mathematics education at university level has received an increasing interest shown by the work of specific groups in the most relevant international conferences in the area such as CERME, ICME, INDRUM, or RUME.Footnote 1 However, the number of contributions specifically focused on geometry at university level is really scarce, apart from those related to prospective secondary school teachers. This work has made it possible to increase the knowledge of the mathematical learning processes at tertiary level.

Among these processes, we are interested in the teaching and learning of geometry. The Van Hiele model has been widely used both to study the level of geometric thought of different groups of students or teachers (Arnal-Bailera & Manero, 2021; González et al., 2022; Jaime & Gutiérrez, 1994; Manero & Arnal-Bailera, 2021; Pandiscio & Knight, 2010; Wang & Kinzel, 2014) and to design learning activities (Abdullah & Zakaria, 2013; Corberán et al., 1994; Guillén, 2004; Howse & Howse, 2014; Wahab et al., 2017). However, there are few works concerning the fifth level (Blair, 2004; Mayberry, 1983; Usiskin, 1982). In order to deepen the knowledge of this level of thought, we decided to give credit to the subjective collective judgment of experts, in this case geometry experts, via the Delphi methodology.

The aim of this study is to gain understanding on the fifth level as a base to start a line of research on the possible effectiveness of an instructional approach to strengthen the acquisition of the previous levels (Blair, 2004; Martin, 2008). In particular, we are thinking about ways of increasing the degree of acquisition of the fourth level of mathematics majors and pre-service mathematics teachers (Manero & Arnal-Bailera, 2021; Demiray & Işiksal, 2017; Pandiscio & Knight, 2010; Sears, 2019). These ideas show the relevance of describing level 5 in terms of a set of indicators (sentences expressing the different abilities of a person at this level). These indicators could serve as a starting point to design level 5-activities that could increase the degree of acquisition of level 4 among mathematics majors and pre-service mathematics teachers. In terms of instructional design, these indicators could help to design the goal of a high-level geometry course. Our research question is: Which characteristics describe better the highest level of reasoning according to the Van Hiele theory?

In order to answer this question, we establish the following specific objectives:

  1. (1)

    Design a list of indicators for every key process present in the fifth Van Hiele level

  2. (2)

    Validate the relevance of the indicators obtained to describe the fifth Van Hiele level

Theoretical Framework

Van Hiele Model

The Van Hiele model (Van Hiele, 1957) has been one of the most relevant theoretical frameworks concerning teaching and learning geometry at all educational levels. This model states the existence of five different levels of geometric reasoning (Burger & Shaughnessy, 1986; Hoffer, 1983; Jaime & Gutiérrez, 1990; Van Hiele, 1986) that can be summarized as follows:

  • Level 1 (visualization). Students at this level recognize geometric figures by their appearance and as a whole. Also, they describe figures using their physical characteristics or comparisons with everyday objects by means of a nonmathematical language

  • Level 2 (analysis). This level is characterized by the students’ ability to handle the parts and properties of figures, which allows them to deal with mathematical descriptions of geometric concepts

  • Level 3 (informal deduction). The reasoning of this level uses logical deductions in the first place, which enable students to interrelate properties of geometric figures. Thus, these students can understand logical classifications of families of figures, construct definitions as sets of necessary and sufficient conditions, and provide some general arguments to justify the validity of a mathematical statement

  • Level 4 (formal deduction). Students at this level can produce formal proofs and deal with equivalent definitions of a concept

  • Level 5 (rigor). Students at this level can compare systems based on different axioms and can study several geometries in absence of concrete models

The main characteristics of these levels consist of its sequential and hierarchical nature, meaning that they are acquired in a specific order throughout the learning process.

Despite the fact that the Van Hiele model contains five levels, it must be emphasized that almost all the related literature has been focused on the four first ones. This lack of work on the fifth Van Hiele level can be explained by the fact that during the scholar years very few students start the acquisition of level 4. In particular, according to Gutiérrez and Jaime (1995), only 22.6% of the last secondary year students show a certain degree of acquisition of level 4.

Among the very few studies in which level 5 has been considered, the works of Usiskin (1982), Mayberry (1983), and Blair (2004) stand out. Usiskin (1982) designed a test whose objective was to determine the Van Hiele level of the students. That test contains 25 questions in which level 5 was also considered. Also, Mayberry (1983) gave some ideas of the properties that questions of a Van Hiele test must have for every level. In particular, the author pointed out that questions in level 5 must be related with propositions considering finite geometries; notice that this idea is coherent with the questions proposed by Usiskin. The level 5 is also considered in the work of Blair (2004). Concretely, the author describes tasks involving classic geometry but considering non-conventional metrics (like Taxicab geometry) as a possible way to develop level 5.

Key Processes

Some authors, such as De Villiers (1987), describe the Van Hiele levels attending to the different processes that are involved in them. Jaime and Gutiérrez (1994) present a description of the Van Hiele levels (from 1 to 4) that is organized in terms of the different key processes: definition (use and formulation of definitions of geometrical objects); proof (the way of convincing ourselves or someone else of the truth of a statement); classification (sort geometrical objects into different families or create a new group of families to sort the objects); and identification (establishing the family to which a particular geometrical object belongs to). Historically these are the process that have been considered in order to describe the Van Hiele levels (Burger & Shaughnessy, 1986; De Villiers, 1987; Gutierrez & Jaime, 1998). Currently, there exists a consensus among the Van Hiele research community that these four processes are the crucial ones to consider as indicators for the Van Hiele levels; for example, recent studies that consider the potential of the Van Hiele framework for Graph Theory have considered these four levels (Manero & Arnal-Bailera, 2021; González et al., 2021).

Concerning the definition process, level 1 definitions consider only physical global properties or consist in descriptions based on diary objects like “a rectangle is something like a window”. Students at level 2 can state definitions in terms of a list of mathematical properties but they do not care if the list of properties contains all the necessary and sufficient ones. Level 3 definitions are stated as a set of necessary and sufficient mathematical properties. Students at level 4 accept equivalent definitions for the same mathematical object and are able to prove the equivalence of both definitions by double implication. Previous research works have provided some key ideas into high-level reasoning: Martín-Molina et al. (2018), in a study about the way professional mathematicians define, present the definition inscribed in a more general process of generalizing while Larsen and Zandieh (2008) point out to proof as a motivation to define when working across different geometries.

Proofs at level 2 consist of verifying the concerning property or statement in one or several concrete examples. Students at level 3 are able to make proofs that only require a few logical steps. The arguments given in order to explain the truth of a statement are based on mathematical properties but usually are informal or lack rigor. In contrast, level 4 students understand proofs and are able to produce formal proofs. Regarding the proof of a proposition, people on this level are conscious that it requires a sequence of implications based on already established properties. Previous research works have provided some key ideas into high-level reasoning: Fernández-León et al. (2020) state the ability of a researcher of selecting and applying different proving methods; Weber and Mejia-Ramos (2011) showed that mathematicians will eventually construct sub-proofs (e.g. lemmas) to help the understanding of the proof; finally, Boero (1999) distinguishes between mathematical proofs as a product, and proving as a (cyclic) process. The work across geometries is a situation that could promote comparing several geometries. This comparison could be based on the study of the transferability of a demonstration (Blair, 2004).

Level 1 classifications are produced attending to global physical properties. Students at level 2 are able to classify geometrical objects into disjoint families attending to their mathematical properties. Inclusive classifications (classifications into families with relations between them) may be done at level 3. Regarding the identification process, it is characterized at level 1 by the recognition of geometrical objects based on their global appearance and physical properties while at levels 2, 3, and 4, this process is based on mathematical properties. The classic descriptions of the Van Hiele levels do not include specifications about what is or what is not classification and recognition at level 5. Moreover, these processes are supposed to be fully developed at lower levels (Jaime & Gutiérrez, 1994). However, it is reasonable to think that working in different geometries or with different metrics could develop different skills to identify or classify objects or environments.

Delphi Method

The Delphi technique (Linstone & Turoff, 1975) consists of a methodology whose objective is to discuss and obtain consensus about a certain question through the iterative administration of a series of questionnaires to a panel of experts. The use of the Delphi method for this research is justified by the facts that there is little literature concerning the related topic and there are many experts in the field of geometry at a tertiary level. Thus, a subjective collective judgment of experts can be an appropriate way to obtain reliable information.

Since the Delphi method allows us to obtain a reliable consensus of opinion from a group of experts, it has been used in many different areas of social sciences like economy, policy making, or social services. It has also been useful in educational settings due to the fact that it eases to form guidelines, standards, or predict trends (Green, 2014). For example, in the last decade, it has been a usual tool in order to develop and validate competencies that teachers should have acquired; for primary school science teachers (Alake-Tuenter et al., 2013) and secondary school mathematics teachers (Muñiz-Rodríguez et al., 2017).

This method can be described as follows: A questionnaire is designed by a small monitor team (usually called research group) who sends it to a larger respondent group (also called panel of experts). Once the completed questionnaires are received, the research group summarizes the obtained answers and, based upon them, develops a new questionnaire for the panel of experts who should evaluate the ideas presented on it. In successive phases, the panel reevaluates the same or a modified questionnaire in which is also given a controlled feedback of the group responses to the previous assessments. This iterative process finishes when a termination criterion is achieved (Linstone & Turoff, 1975).

The implementation of this method can be summarized explaining the initial preparation, the administration of questionnaires, and the termination criterion.

The initial preparation requires to determine the expert panel and the elaboration of the first questionnaire. On one hand, Hsu and Sandford (2007) describe that people with experience and related background on that issue can be considered eligible. Also, heterogeneity on the members of the panel is desirable. On the other hand, the number of panelists is in general lower than 50 (Witkin & Altschuld, 1995), and most of Delphi studies contain 15–20 experts (Ludwig, 1997). The first questionnaire is traditionally formed by open-ended questions; however, structured questionnaires on the first phase can be used if they are obtained after an exhaustive review of the related literature (Hsu & Sandford, 2007).

After the first questionnaire is fulfilled and sent back, the research group analyzes the experts’ productions. This analysis consists of a revision of their answers in order to clarify, avoid duplications, and classify them into different categories. With the results of the previous analysis, a structured questionnaire is designed. Afterwards, the expert panel is required to evaluate the relevance of the different items presented there, generally using a Likert scale (Cox, 1996). In the following phases, the panel of experts receive some controlled feedback possibly including all the panelists’ answers to the previous phase (Andranovich, 1995; Scheibe et al., 1975) or just some measures of centrality; in this latter case, the use of the median is preferred in the literature (Eckman, 1983; Hill & Fowles, 1975; Jacobs, 1996). Along with this feedback, the panelists are asked to reevaluate the items.

Regarding the termination criterion, Linstone and Turoff (1975) established that three phases could be sufficient to complete the study. Another termination criterion consists in finishing the study when consensus is achieved. Usually, consensus is said to have been obtained when a high percentage of ratings concentrates in a concrete range (Miller, 2006). Another finishing criterion can be validity: A certain item or idea is considered to be validated when there exists consensus about its relevance or convenience. Other authors (Scheibe et al., 1975) consider stability (no significant differences in the evaluations of consecutive phases) in order to terminate the Delphi study.


Application of the Delphi Method

The iterative administration of the questionnaires has been done via online, concretely using Google questionnaires. The number of phases was previously fixed in three as stated by Linstone and Turoff (1975).

The first questionnaire was validated by two professors of mathematics education who are experts in the Van Hiele model, none of them being involved in this work.

In the first phase, a personalized email was sent to the participants including a cover letter and a link to the questionnaire. In the letter, the study and their objectives were presented as well as a brief description of the Delphi method and a schedule of the phases.

At the beginning of the questionnaire, we included an introduction to the Van Hiele model (see Fig. 1). The questionnaire consisted of five questions. The first four questions were related to the four key processes (Burger & Shaughnessy, 1986) and were introduced by a short description of the corresponding process, for each level, according to the one given in Gutiérrez and Jaime (1995). After this introduction, we posed the following open question: Regarding this process, what elements, capabilities … Describe someone who is at level 5 of geometric reasoning? Thus, the panel of experts was required to elaborate comments that, in their opinion, described the different processes in level 5. The last question was: If you consider that Van Hiele level 5 involves processes other than those described above, you can indicate it here. This question intended to explore possible different processes involved in the highest Van Hiele level.

Fig. 1
figure 1

Introduction to the Van Hiele model presented in the questionnaire

Finally, the research group analyzed all these comments following a thematic analysis (Clarke & Braun, 2016): We clustered them into similar categories, clarified ambiguous wordings, discarded some due to their inappropriateness, and deleted duplicities. As a result, we obtained the first list of indicators.

In the second phase of the Delphi method, the experts were asked to evaluate the relevance of the indicators obtained after phase 1 using a four-point Likert scale ((1) no relevance; (2) insignificantly relevant; (3) relevant; and (4) highly relevant). Experts were allowed to make comments in order to justify their ratings.

In order to validate the different indicators, we followed a similar approach to the one described in Muñiz-Rodríguez et al. (2017). We classified the indicators into three different groups according to its validity, i.e. the consensus about its high relevance:

  • Validated. When at least 70% of the experts’ ratings were 4

  • Reevaluate. Between 40 and 70% of the experts’ ratings were 4

  • Rejected. When less than 40% of the experts’ ratings were 4

The consensus percentage has been established at 70% since according to Hsu and Sandford (2007) and Green (2014), this is the percentage used to determine consensus for a four-point Likert scale. Then, if there exists consensus concerning the high relevance of a certain indicator, we consider it as validated by the panel of experts.

In addition to validation, it would be interesting to follow a subsequent process that would detect gaps or repetitions among the indicators or allow them to be written in a more functional way for their use in teaching. We refer to a process that would include the design of tasks for university mathematics students in which it would be possible to detect the presence of such indicators.

The questionnaire of the third, and final, phase asked for the rating of indicators that have not been validated nor rejected in the previous phase. As Andranovich (1995) and Scheibe et al. (1975) recommend, we included all the previous ratings of these indicators in the questionnaire. For the indicators that have received relevant comments in the second phase, we provided an alternative wording to be rated.


According to Hsu and Sandford (2007), the panel of experts’ must be a heterogenic sample of people with expertise or experience in the concerning issue. In this regard, the sample was designed to be representative of the geometry experts’ population. Specifically, we have considered only mathematicians with expertise in geometry and topology. Concretely, the whole sample of our study consisted of 25 university teachers, all of them holding a PhD in geometry and topology. A small group of the respondents (5 out of the 25 teachers) had prior experience on mathematics education having held positions at the mathematics education university department or in secondary education. Thus, our sample combines the required expertise in mathematics and teaching experience at different educational levels.

The respondents to our questionnaires belong to different European universities, mostly in Spain. The reasons to choose these respondents were related to the following: accessibility, they were easy to contact with through their university websites; relevance of their research, the respondents belong to research groups with an important presence in conferences such as ICMFootnote 2 and a large number of indexed publications; heterogeneity of their research, the respondents belong to research groups focused on different sub-fields of geometry: algebraic geometry, differential geometry …; and commitment, they agreed to answer the different questionnaires and to participate in the study until its completion; in some of the cases, this commitment had to do with the professional relation with the authors, and note that the Delphi literature stablishes the importance of a motivated panel in order to obtain high-quality responses (Andranovich, 1995).

We have not found worldwide or European statistics on the number of geometry and topology teachers at tertiary level. However, to gain some understanding of the size of the intended population, we considered that in the last 30 years, only 453 thesis were defended in Spain related to the area of geometry and topology.Footnote 3 Moreover, in this country, there are only around 200 university teachers of geometry and topology.Footnote 4

The sample is balanced in terms of gender and university position (see Table 1). Considering the teaching area of the experts, there are 15 teachers of geometry and topology, 7 of other areas of mathematics (mainly applied mathematics), and 3 part-time university/secondary school teachers. Out of the 25 teachers, 19 are currently enrolled in research projects (with research papers published in the last 5 years in the area of geometry and topology), while the other 6 are mainly dedicated to teaching or management tasks.

Table 1 Distribution of the sample by gender and position

The panel of experts involved two different groups, namely A and B, who were formed by 14 and 11 experts respectively. Group A participated along the three phases of the study while group’s B participation started at the second phase. This incorporation of members to the panel of experts during the different phases of the study helps to increase the validity and reliability of the results (Linstone & Turoff, 1975; Muñiz-Rodríguez et al., 2017). Two of the participants did not answer at the third phase of the study and thus, the number of respondents has been different at every phase: 14, 25 and 23 respectively.


First Phase

By condensing all the respondents’ answers to the first phase questionnaire, we got to formulate a list of indicators (see Table 2). These indicators express potential competences of a person reasoning at level 5.

Table 2 Phase 1 results. Indicator’s list

After receiving the answers, the researchers performed a process of classifying the 81 sentences according to their content on each of the four processes: 27 for definition, 25 for proof, 15 for classification, and 14 concerning identification.

In the case of the definition process, the comments were classified forming six indicators (see Table 2). We describe how the definition indicator Def2Footnote 5 was built according to the three comments in Fig. 2. Respondent 9 (R9) explains that, in different environments the defined object can have different properties. Something similar is pointed out by R4 but using the term mathematical structures-theories instead of environments. Finally, R1 focuses on the similarities and differences of the same definition when considered in different axiomatic systems. Thus, indicator Def2 tries to summarize that people at level 5 are able to understand that the properties of a defined object depend not only on its definition but also on the geometric contextFootnote 6 in which it is defined.

Fig. 2
figure 2

Design of a definition indicator

This indicator can be exemplified as follows: If we consider the definition of the sphere as the set of points of \({R}^{3}\) which are at the same distance of a certain point, indicator Def2 shows that someone at level 5 is able to understand that the properties of such object would depend, not only on its definition, but also on the metric (Euclidean, Taxicab …).

Concerning the proof process, we obtained 25 comments in phase 1 that were summarized into five indicators (see Table 2). In order to illustrate the indicator construction process, we describe how the indicator Pro5 was built (see Fig. 3). Respondent R6 remarked that proving at level 5 can be done even if the definitions of new concepts are required. Another respondent, R9, described that a person at this level is able to structure a demonstration into partial interesting results as lemmas. Finally, respondent R11 highlighted that a relevant characteristic of level 5 proofs was to decompose long proofs into definitions, lemmas, and other proofs. From all these comments, we designed indicator Pro5 based on the idea that proofs at this level can be structured introducing different lemmas or definitions.

Fig. 3
figure 3

Design of a proof indicator

The idea of structure a proof into different interesting results such as lemmas is very common among professional mathematicians. There also exist many examples of mathematical objects which have been defined with the purpose to be used or structure properly a theorem or its proof. A classic example of this idea in differential geometry is Ado’s Theorem which states that every Lie algebra admits a finite representation. This result can be found in classical works of geometry such as Varadarajan (1984) or Lee (2013, p. 199). In both cases, it can be noticed that the definition of finite representation appears only a few lines before the theorem, showing that it has been introduced only because it is required to state and prove the theorem.

In the case of the classification process, we obtained 15 comments that were classified forming four indicators (see Table 2). Respondent 1 highlighted the competence of comparing classifications in different geometries while respondent 13 stated that the careful specification of the equivalence relation is a key element in this process at fifth level. From these comments, we designed indicator Cla2 pointing to the comprehension of the relativity of the classification process with respect to the geometric context (see Fig. 4).

Fig. 4
figure 4

Design of a classification indicator

This indicator tries to underline the importance of the geometric context when classifying objects. For example, as R13 points out, from a differential point of view, an ellipsoid and a sphere are the same object while this is no longer true if we consider a metric point of view since the curvature of both objects is different.

In the case of the identification process, we obtained 14 comments that were classified forming four indicators (see Table 2). As a matter of example, R3 commented on the relevance of the Reidemeister’s movements to identify knots while R8 considered that some objects can be identified by studying the possibility of constructing new objects from the given ones via, for example, fiber bundles. From these comments, we designed indicator Ide1 expressing the ability of recognizing objects through transforming processes (Fig. 5).

Fig. 5
figure 5

Design of an identification indicator

There are several examples of these transformation processes that can be identified in advanced geometric work. For example, in the mathematical area of knot theory, a knot can be represented in very complicated manners; however, the Reidemeister’s moves consist of transformations that preserve the knots, and maybe simplify its presentation which eventually allows us to identify them more easily. Another example, in the field of Riemannian geometry, consists on identifying a six-dimensional manifold (M,g) as Calabi-Yau if the product manifold N = M × R endowed with the cone metric h = g + dt2 (with t the coordinate in R) is a parallel G2 manifold (Boyer & Galicki, 2007, p. 544), which can be recognizable in terms, for example, of the behavior of its fundamental 3-form.

We can see in Table 2 the initial list with the 19 indicators obtained after the first phase: The processes of definition and proof have been described through six and five indicators respectively while the processes of classification and identification have been described only with four indicators each.

Notice that, from the revision of the 19 indicators, we can observe that related competences or topics appear in different processes. There are three indicators related to the comparison of definitions, proofs, or classifications. There are indicators highlighting the relevance of different contexts in all the processes. Finally, we have designed several indicators that point to the conceptual differences of the processes with the previous levels. Among others, be aware of the necessity to define in order to introduce a new mathematical object (Def4) or the ability to perform mixed formulation-proof processes (Pro4).

Second Phase

After designing the list of indicators, we administered another questionnaire to assess their relevance to describe the fifth level. We present now the different indicators with the percentage of persons awarding the maximum degree of relevance to each of them (Table 3).

Table 3 Percentage of persons awarding the maximum degree of relevance—phase 2

Some of the 13 indicators to be reevaluated received comments suggesting changes, which led to their rewriting (see Table 4). For percentages between 40 and 70%, there were two different cases: If the commentaries of the respondents gave enough information, we designed a new indicator that was included in the third phase questionnaire along with the original one; otherwise, we included only the original indicator. As a matter of example, the indicator Pro1 for the proof process was re-written in a similar form but including some modifications due to the respondents’ comments: Since the comments expressed doubts about the meaning of geometric context, we included some examples in the new indicator Pro1.1 (see Table 4).

Table 4 Modified indicators according to respondents’ comments

During this phase, the panelists provided information about their concerns; thus, in order to illustrate the Delphi process, we describe some of the information of the panel’s reasoning in this phase. About Pro1 and Pro2, some of the respondents stated that, on the one hand, these items would correspond to Van Hiele level 4. Regarding Pro4, the whole item was not clear to some respondents; in particular, the expression “…but also shapes them” was not well understood. With respect to the idea of the relevance of different geometric contexts, respondents agreed with Cla2 but not with Def2: Some respondents noted that the question is not about an object having different properties in different contexts but about how different contexts focus their attention in different types of properties.

For a better understanding of the modified indicators, we provide now some examples. A way to illustrate Ide2.1. consists of thinking in the identification of a circle using Taxicab or Euclidean metric. If we establish the definition of circle as the set of points which are at the same distance from a given point, the shape of this object will depend on the metric of the ambient space. Concretely, if we consider the Taxicab metric, this object looks like a square while if we consider the Euclidean metric, it looks like a “classical” circle.

For the case of Pro2.1. it can also be exemplified using the comparison between Taxicab and Euclidean metrics. For example, the classical proof of the result that the sum of the three angles of a triangle is 180° can also be done using Taxicab metric due to the fact that the only requirement of that proof is that 180° spins are isometries, which is also satisfied with the Taxicab metric despite the fact that isometries in this later metric are not the same as the Euclidean ones.

Third Phase

Out of the original 19 indicators, 13 needed to be reevaluated in the third phase (see Table 3). Apart from these, 5 indicators were re-written (see Table 4). Thus, the third phase questionnaire consisted of 18 indicators. We present in Table 5 the results obtained in this final phase. Note that all the percentages increased in the third phase. However, since some of them were still lower than 70%, this was not enough to validate all the proposed indicators.

Table 5 Percentage of persons awarding the maximum degree of relevance—phase 3

For the definition process, there were 6 indicators at the questionnaire proposed in the second phase. With the results of the second phase, one of them was rejected and two were validated. The other three indicators were validated after the third phase questionnaire.

Considering the proof process, there were 5 indicators at the questionnaire proposed in the second phase. With the ratings obtained in that phase, one was validated and the other four were proposed for a reevaluation. Once we analyzed the comments, indicators Pro1, Pro2, and Pro4 were proposed with an alternative writing (Pro1.1, Pro2.1, and Pro4.1) along with indicator Pro3. According to the third phase results, indicators Pro2.1, Pro3, and Pro4.1 were validated. Indicators Pro1 and Pro1.1 were classified as non-validated since they did not get more than 70% of high-relevance rating. Note that indicators Pro2 and Pro2.1 had the same percentage, so we decided to choose Pro2.1 due to the presence of comments of respondents expressing their preference for it.

For the classification process, there were 4 indicators at the questionnaire proposed in the second phase (see Table 2). With the results of the second phase, one of them was validated and three were proposed for a reevaluation. Since there were no comments on this process, we could not propose a rewriting of these three indicators. According to the third phase results, indicator Cla1 was validated, while indicators Cla3 and Cla4 were classified as non-validated.

Considering the identification process, there were 4 indicators at the questionnaire proposed in the second phase. With the results of the second phase, one was rejected and the other three were proposed for a reevaluation, two of them along with an alternative rewriting. According to the third phase results, indicator Ide1.1 was validated, while the rest of the indicators were classified as non-validated.

We present in Table 6 the list of validated indicators. We can observe the relevance of definition and proof processes in comparison with the rest of them.

Table 6 Indicators validated by the panel describing the fifth Van Hiele level


Our results highlight the processes of definition and proof as the most relevant in Van Hiele’s level 5. However, classification and identification processes seem to play a certain role at this level. In quantitative terms, the processes of definition and proof have been described through five and four indicators respectively while the processes of classification and identification have been described by two and one indicators respectively.

In the definition process, the use and formulation of definitions are usually studied separately. At this level, the indicators that have appeared are mainly related to the formulation of definitions. These five indicators can be clustered into two different groups: on one hand indicators Def1 and Def2, and on the other hand indicators Def3, Def4, and Def5.

Indicators Def1 and Def2 form the first set of indicators which is related to the influence of working in diverse geometrical contexts. Indicator Def1 (constructs and uses definitions in different axiomatic systems) is a core part of the Van Hiele level 5 (Burger & Shaughnessy, 1986) and expresses the ability of working at different axiomatic systems, whereas indicator Def2 (understands that defining a given mathematical object is not something absolute, but is an action relative to the geometric context in which one works, implying for example that the defined object may have different properties in each context) points out the relevance of reasoning across several geometries as opposed to within a particular geometry (Blair, 2004). In this regard, Silfverberg (2019) states that, a person reasoning at level 5 can investigate Euclidean-equivalent concepts in geometries such as spherical or Taxicab which means that this person can move between different axiomatic systems.

Indicators Def3, Def4, and Def5 are related to the reasons that lead to formulating new definitions or choosing between definitions. Concerning the formulation of new definitions, Martín-Molina et al. (2018) point out: “a popular activity in the field of differential geometry consists of defining new spaces by taking a well-known definition and generalizing it” (p. 1076). Several of our respondents suggested similar ideas leading to the construction of indicator Def3 (defines new objects, for example, because it may be necessary to generalize existing ones or to prove a statement). In their study, Martín-Molina and colleagues identified four phases in the defining process (finding an opportunity to generalize an existing concept, proposing a new definition, justifying that the new definition is valid, and continuing the chain of definitions). Notice that while Def3 can be associated with the generalization phase, indicator Def4 (understands that a definition arises from the need to introduce a new mathematical object or to emphasize a property) highlights the reasons that lead to proposing a new definition.

Once the need for a definition of a concept has been established, it may be necessary to choose among several definitions of the same concept depending on the objectives; this need is reflected in the indicator Def5 (compares equivalent definitions to choose the one that interests him/her most, depending on the work to be done). Kemp and Vidakovic (2021) report on the problems faced by students enrolled in a college geometry course when they had to extend the concept of midset from Euclidean to Taxicab geometry. To solve this problem, they needed to compare and analyze the differences between two equivalent definitions of midset in order to determine which one could be transferred to Taxicab geometry.

Regarding the proof process, we have found four different indicators (five if we consider Pro1Footnote 7) which can be grouped as Pro1 and Pro2.1 on one hand and Pro3, Pro4.1, and Pro5 on the other hand.

Indicators Pro1 and Pro2.1 form a first set of indicators which is related to the construction of proofs in different geometric contexts. Indicator Pro1 (performs proofs in different geometric contexts) is a fundamental part of Van Hiele’s level 5 (Burger & Shaughnessy, 1986) and expresses the ability of working at different axiomatic systems. Indicator Pro2.1 (is able to consider whether or not a proof can be totally or partially transferable to another geometric context, understanding that proving a given mathematical result is an action linked to the geometric context in which one works) is related to the importance of reasoning across several geometries as opposed to within a particular geometry. In relation to this indicator, Blair (2004) documented instances when mathematics students compared geometries by attending to whether the proof of a theorem in Euclidean geometry could be transferred or adapted to Taxicab or spherical geometry.

Indicators Pro3, Pro4.1, and Pro5 form a second set of indicators related to the knowledge of the internal structure of the proof: indicator Pro3 (compares proofs on the basis of criteria that are of interest to them, e.g. the possibility of using them to prove more general results). In this regard, Fernandez-León et al. (2020), in a study with professional mathematicians, identified the ability of selecting between different proving methods to prove the same result. This idea can be connected with Pro3 since this selection requires comparison according to a criterion.

During the proof process, the hypotheses or the conclusions of a proposition can be found not adequate and it can be decided to generalize or to limit its validity; this has been expressed by our respondents leading to the construction of indicator Pro4.1 (performs mixed formulation-proof processes being able to eventually modify the statement he/she is trying to prove depending on the development of the proof). This fact has been expressed by Boero (1999) by distinguishing between mathematical proofs as a product, and proving as a process. He stated that the proving process consists of six phases which are usually interconnected in non-linear ways in mathematicians’ normal work. Based on these ideas, Fernandez-León et al. (2020) found two activities that connect to Boero’s ideas: (i) modifying statements and (ii) formalizing modifications of statements related to modifying the hypotheses or conclusions of a theorem.

Indicator Pro5 (structures the proof recognizing partial results that can be useful in another and to which it gives its own entity in the form of lemmas, definitions) is of great interest at this level and some of the ideas described on it have already been pointed out by several authors. Concerning the identification of partial results, Weber and Mejia-Ramos (2011) showed that mathematicians will eventually construct sub-proofs to verify that each line in a proof follows from previous assertions. Also, Fernández-León et al. (2020) stated that it is common to search for proof techniques or tools used in other proofs that may fit in well with a new proof. Attending the construction of definitions during the proving process, Larsen and Zandieh (2005) showed an example of a proof analysis that supported the development of a new concept: the idea of a small triangle.

The classification process has only two validated indicators: Cla1 (classifies mathematical theories, for example, to see Euclidean geometry as a particular case of a family of geometries) and Cla2 (understands that classifying a given mathematical object is not something absolute, but an action relative to the geometric context in which one works, implying for example that the equivalence relation between geometric objects varies from one context to another). The former indicator is related to the faculty of comparing axiomatic systems; in this sense, Silfverberg (2019) states that, at the highest Van Hiele level, different geometries can be compared by observing their differences and similarities as axiomatic systems.

The identification process has only one validated indicator: Ide1.1 (identifies geometric objects through processes that transform the given object into an equivalent one that is directly recognizable). For instance, in differential geometry, it is common to identify manifolds via processes like warped products (different metrics) of the original manifold (Boyer & Galicki, 2007) or considering the latter as the fiber of a certain fiber bundle like for example in the mapping torus constructions.

Based in our results, we consider that the Van Hiele level 5 indicators are consistent with the previous descriptions found in classical works such as Jaime and Gutiérrez (1994) where they give the greatest relevance to the proof and definition processes at the highest levels. Moreover, we have stated in our discussion the close relation between both processes at this level. This relationship can be seen in indicators Pro5 and Def3 which explain how a proving process can lead to a definition of a new object while a definition can help to prove a result.

On the other hand, working with a combination of different geometries has resulted in the re-appearance of indicators related to the processes of identification and classification; note that level 4 is presented only in terms of definition and proof processes (Gutiérrez & Jaime, 1995). The Van Hiele model proposes that mathematical objects become explicit (Van Hiele, 1986) as higher levels are reached (e. g., when a person achieves level 2, the elements of a geometrical object become explicit in contrast with the global appreciation of these objects at level 1). In the fifth level, the possibility of working across geometries becomes explicit (in contrast with a somehow internal work in previous levels) giving a new meaning to the classification process. This meaning includes two different aspects: On the one hand, we can classify different geometric contexts (e.g. Euclidean, hyperbolic, and spherical geometries), and on the other hand, we can understand that different classifications of geometric objects are provoked by different equivalence relationships (the classification of surfaces in geometry depends on the diffeomorphism equivalence while in topology the equivalence relation is given by the homeomorphism).

Concerning the identification process, the mastery of the relationships between different geometries, metrics … may allow that, in certain cases, the identification at this level is carried out by working with a different metric from that of the initial problem and then somehow transferring the results to the original metric. In both processes, it is fundamental to work between geometries, which is a distinctive feature of level 5.

Another classical work (Burger & Shaugnessy, 1986) established that, at level 5, systems based on different axioms can be studied and compared. The obtained results in this study agree with this idea; in particular, indicators Def1, Def2, Pro1, Pro2.1, Cla1, and Cla2 focus on the relationship between geometric contexts more than on the geometric contexts considered individually. This idea is related with Blair’s (2004) considerations: “…reasoning within Taxicab or spherical (or any other geometry for that matter) characterized according to the complexity of the students’ object of thought is not inherently more or less ‘advanced’ as that within Euclidean geometry” (p. 337). In this regard, Guven and Baki (2010) highlighted that working with spherical geometry has similar difficulties to working with Euclidean geometry. In particular, they were able to show a certain parallelism between Van Hiele’s levels of reasoning with spherical geometry and Euclidean geometry.

Some limitations have to be discussed. Firstly, the answers of the respondents could depend on their particular field of expertise. It seems reasonable that experts in differential geometry and algebraic topology would give different ratings to the same indicator. Moreover, if the panelists in future studies were researchers in different areas of geometry, the examples they would propose would be more varied. Secondly, in order to obtain more information and to go deeper into the particular reasons that lead to value each indicator in one way or another, it could be of interest to include interviews with the panel members in such studies. A third limitation can be found in the application of the Delphi method. Instead of using the number of phases as a termination criterion, we could have decided to study the stability of the ratings. Probably, this would have produced more reliable results about the validity of indicators such as Pro1 with percentages really close to 70%. However, this could have led to a higher number of phases with the risk of losing respondents in the final phases.

Our future research interests include the design of a questionnaire including mathematical tasks in which the obtained descriptors of the fifth level could be identified among the responses. The implementation of such a questionnaire with college/university mathematics majors and the subsequent analysis of their productions could give empirical support to the indicators obtained in this study. This future work could provoke a refinement of the indicators as well as a deeper validation of them.

On the other hand, our future research also includes the study of how level 5 is related to previous levels and how including activities at that level can improve the teaching of geometry. In particular, Pirie and Kieren (1991) established that a person can recur to inner (lower) level of understanding when challenged by outer (higher) level activities. This process, called folding back, “allows for the reconstruction and elaboration of inner level understanding to support and lead to new outer level understanding” (p. 172). This leads to the interest in determining the characteristics of fifth level reasoning which would allow to create activities adapted to such level that can promote the fully acquisition of the fourth level. This research interest is related to previous works of the authors (Arnal-Bailera & Manero, 2021; Manero & Arnal-Bailera, 2021) where we found that graduated students enrolled in teacher training programs had not completely acquired the fourth level. Our work can also facilitate different current research lines as comparative studies between different groups (in-service teachers at different levels, mathematics majors …) with respect to aspects such as demonstration following those initiated by Buchbinder et al. (2022) or university student’s interactions when working with non-Euclidean geometries (Kaisari & Patronis, 2010).