1 Introduction

Concept maps are becoming a popular educational tool used by teachers and learners across many disciplinary areas. A concept map can be broadly defined as a node-link diagram in which each node represents a concept, and each link represents the relationship between the two concepts it connects, with labels on the lines to specify the relationships (Gurlitt & Renkl, 2010; Schroeder et al., 2018). Diagrams resembling concept maps have been used by philosophers and logicians for many centuries (Nesbit & Adesope, 2013), but the term “concept map” and the idea to use it as an educational tool originated with Joseph Novak and his colleagues in the 1970s (Novak, 1990). The main assumption made by the advocates was that constructing concept maps is an effective way to promote meaningful learning. The key feature of a high-quality concept map is a clear hierarchical representation of concepts arranged with more general concepts placed higher on the map and linked to more specific concepts placed lower on the map (Novak & Cañas, 2008). Novak and colleagues have further recommended that effective concept maps should include horizontal cross-links to elaborate on the relationships other than generality/specificity. Figure 1 illustrates three different categories of concept maps for “Flowers”, according to the classification of structural quality formulated by Kinchin et al. (2000). The first type is “spoke” (A in Fig. 1), where the main concept is placed at the centre, and the only connections found in the map are between each derived concept and the central concept, resembling the shape of a spoke. The second type is “chain” (B in Fig. 1), which has a linear structure, developing only a single strand of a specific aspect of the main concept. Both “spoke” and “chain” are not considered good examples of effective concept mapping. The third type, “net” (C in Fig. 1), is regarded as the best of the three because, in addition to a hierarchy depicting more specific concepts of each strand from the central concept downwards, it also reveals cross-link connections between the concepts developed in different strands.

Fig. 1
figure 1

Structural types of concept maps, adapted from Kinchin et al. (2000)

Educators can use concept mapping in two major ways. First, an expert can devise a concept map as a learning tool to be provided as a guide for learners to study. Alternatively, students can be tasked to create a map by themselves. Supporting the original premise of early proponents, the extant research shows that concept mapping activity has a much greater impact in the latter setting, particularly at a postsecondary level (Schroeder et al., 2018). This is likely to be due to the meaningful engagement required from the mapper when creating their concept map. This action involves the so-called higher-order learning activities such as organising and synthesising. As a result of engaging in such activities, the concept mapping process is believed to enable high-quality learning (Nesbit & Adesope, 2006; O’Day & Karpicke, 2020; Schroeder et al., 2018). However, no integrative cognitive theory has yet been employed to explain the processes by which students learn with concept maps (Gurlitt & Renkl, 2010; Nesbit & Adesope, 2006, 2013).

Despite the lack of theory, many studies have been published demonstrating promising results about the concept mapping utility in educational settings. A recent meta-analytic study showed that learning with concept maps produced a moderate, statistically significant effect (g = 0.58, p < .001) compared to other conventional forms of studying in a synthesis of 142 studies (Schroeder et al., 2018). Of the 142 studies included in the meta-analysis (Schroeder et al., 2018), 118 were classified as STEM. Most investigated the use of concept mapping in learning natural sciences: biology, physics, and chemistry. At the tertiary level, three studies investigated concept mapping as a learning tool in statistics courses (Chiou, 2009; Lambiotte et al., 1993; Sas, 2008). However, none of the studies included in the meta-analysis considered concept mapping in the tertiary mathematics context.

A small corpus of qualitative case studies exists in the mathematics education literature, which illustrates various implementations and suggests the potential viability of concept mapping in mathematics classrooms (Afamasaga-Fuata, 2009; Baroody & Bartels, 2000; Gallenstein, 2011; Ryve, 2004; Wilcox & Sahloff, 1998; Williams, 1998). There is evidence that examples of concept mapping are used in teacher training in the UK, USA and Australia (Afamasaga-Fuata, 2009; Ollerton & Watson, 2001; Prestage & Perks, 2013; Schmittau, 2004), although such use may not be fully reported in research journals. Some work has been done to describe the training needed to enable students to construct concept maps in a Grade-8 Chinese classroom (Jin & Wong, 2010) and unpack the nature of the conceptual understanding held by this group through the use of concept maps (Jin & Wong, 2015, 2023).

In summary, it appears that, specifically for mathematics, the existing research has not fully explored the effectiveness of concept mapping as a learning tool nor analysed its utility as an assessment tool. The present study attempts to make progress by reporting on incorporating concept mapping into a university mathematics course as a weekly task. We employ an exploratory study methodology with two aims: (1) to analyse the relationship between student concept mapping performance and student performance on the final exam and (2) to investigate the relationship between concept mapping performance and a fundamental psychometric construct linking cognitive and affective domains — learner self-efficacy. Before introducing the study, we first outline theoretical justifications in support of concept mapping as a learning and assessment tool based on the educational psychology and mathematics education perspectives and briefly overview the existing literature on concept maps and self-efficacy.

1.1 Theoretical and empirical foundations

1.1.1 Concept mapping as an effective learning strategy

Our understanding of the mechanisms involved in learning from a cognitive perspective has advanced substantially over the last five decades by employing mathematical modelling and experiential testing of those models (Atkinson & Shiffrin, 1968). Simplistically, a modal model of Human Cognitive Architecture could be used to illustrate the current research state relevant to our context (Sweller, 2008; Sweller et al., 2019). Figure 2, modified from Inglis and Mejía-Ramos (2021), illustrates the three main components of human cognitive architecture and their interactions (Atkinson & Shiffrin, 1968; Clark & Paivio, 1991; Fiorella & Mayer, 2015; Mayer & Moreno, 2003). The first component, sensory memory, allows for the incoming sensory information (such as what we see, hear, touch, smell, etc.) to be stored long enough for the selected information to be transferred to working memory (second component). Even after the stimuli have ceased, impressions of sensory information could be retained in sensory memory for short periods of time, provided they are selected for further processing in the working memory by the mechanism of attention (Dehaene, 2020).

Fig. 2
figure 2

 A modal model of human cognitive architecture (modified from Inglis and Mejía-Ramos (2021))

Working memory represents the domain where all conscious thinking happens — it represents a state of awareness while processing information. Crucially, working memory has two limitations: first, the duration for which the information can be stored there is very short; almost all of the information is lost after 20 s without rehearsal (Peterson & Peterson, 1959; Sweller, 2021). Second, working memory is extremely limited in capacity. It has been known since the mid-1950s that working memory can only hold about seven items of information (plus or minus two) (Miller, 1956). However, only about three to four items of information can be processed simultaneously while mentally combining, comparing, or manipulating the items somehow (Cowan, 2001). As an example, on average, we can remember about seven random digits, but if asked to reorder them from, say, highest to lowest, the successful completion of the task would be challenging unless the number of digits is reduced.

The process of learning happens through working memory. There are two ways information can enter working memory. By focusing attention on certain incoming sensory information, one might consciously process a stimulus from sensory memory. The information can also come from the third memory system, long-term memory. Long-term memory is a central component in human cognitive architecture; it represents a repository of an enormous network of complex units of closely linked snippets of information, called schemas. We now know that our sense of self comes from that vast amount of information stored in long-term memory (Sweller, 2021). It is well-evidenced that the main reason experts outperform novices is because of the superior quality of their schemas that organise a sizable body of knowledge and index it by a large number of patterns that, on recognition, guide the expert in a fraction of a second to the relevant parts (Chi et al., 1981; Ericsson & Lehmann, 1996; Larkin et al., 1980; Sweller et al., 1998). These schemas constitute long-term memory and, when required, are retrieved and integrated into working memory (Chase & Simon, 1973). Long-term memory appears to have no practical capacity limits and, thus, could be used to overcome the temporal limitations of working memory. If you have stored a schema in long-term memory, you can repeatedly reintegrate it into working memory, transcending the 20-second limit. Furthermore, long-term memory also functions as a bypass for dealing with the capacity limits of working memory. Suppose snippets of information are organised into a coherent schema and stored in long-term memory. In that case, the whole schema can be brought into working memory as one item of information to be manipulated and integrated with other information units.

Since establishing the central role of long-term memory, educational psychologists and neuroscientists have devoted a great deal of attention to understanding how we learn most effectively. It is well understood that in educational contexts, learning is not just an additive process of storing new snippets of information in memory, as in a computer (Fiorella & Mayer, 2015). Rather, learning depends on two major factors: (1) what is presented and how and (2) the cognitive processing that the learner is actively engaged in during learning. Thus, learning is viewed as a generative activity. This conception of learning is a natural theoretical advance built on the coalescence of the cognitive revolution and constructivist ideas about the importance of meaning-making while learning. This well-evidenced theory envisions effective learning to comprise three stages: (1) learners actively select the relevant aspects of incoming information by paying attention, (2) which is followed by organising this information into a coherent cognitive structure in working memory, and (3) integrating cognitive structures with relevant prior knowledge activated from long-term memory (Fiorella & Mayer, 2015).

Mathematics education research further developed these conceptions to distinguish between different types of mathematical understanding and knowledge. These ideas can be traced back to Skemp’s (1976) distinction between “relational understanding” and “instrumental understanding” of mathematics. The latter describes a limiting yet commonly occurring understanding based on knowing a set of rules without understanding the reasons, whereas “learning relational mathematics consists of building up a conceptual structure (schema)” (Skemp, 1976, p. 14). Later research on mathematical thinking divided mathematics knowledge into two types: procedural knowledge and conceptual knowledge (Hiebert & Lefevre, 1986), with the latter defined as “knowledge that is rich in relationships. It can be thought of as a connected web of knowledge, a network in which the linking relationships are as prominent as the discrete pieces of information” (pp. 3–4). This dichotomy was subsequently questioned and reconceptualised by Star (2005) to include the depth dimension (accounting for quality).

The latest mathematics cognition perspective assumes that conceptual understanding is developed when a sufficiently well-organised schema has been encoded into long-term memory (Inglis & Mejía-Ramos, 2021). From this perspective, concept mapping is assumed to make learning efficient because it promotes meaningful learning and requires learners to engage deeply with the material by focusing on the organisational structure of a set of related concepts and producing elaborative connections among them (Gurlitt & Renkl, 2010; Nesbit & Adesope, 2006). The term “knowledge elaboration” is often used in reference to meaningful learning, emphasising the importance of using prior knowledge to expand and refine new insights utilising processes such as organising, restructuring, interconnecting, integrating new elements of information, and identifying relations between them (Kalyuga, 2009). Research has shown that knowledge elaboration is the key mechanism behind the success of well-known learning strategies such as self-explanations (Chi et al., 1994) and elaborative interrogation (Dunlosky et al., 2013). Viewed from this perspective, the similarities between knowledge elaboration strategies and the processes involved in high-quality concept mapping are recognisable. According to Karpicke and Blunt (2011, p. 772), “concept mapping bears the defining characteristics of an elaborative study method: It requires students to enrich the material they are studying and encode meaningful relationships among concepts within an organized knowledge structure.” In other words, the construction of a high-quality concept map involves high-order thinking necessary to explicate not only generality/specificity relations (e.g., concept maps A and B in Fig. 1) but also cross-links — the connections between concepts derived from different strands developed from the main concept (e.g., C in Fig. 1). Such constructivist activity is likely to foster meaningful mathematics learning and promote “relational understanding” in the sense of Skemp (1976) by activating efficient processes in working memory in selecting relevant information (from learning resources), integrating it with the existing schemas in long-term memory, and reorganising this knowledge into a new, bigger and better-organised schema, which could subsequently be encoded into long-term memory.

1.1.2 Concept mapping as an assessment tool

Research on assessment distinguishes between summative assessments, designed to determine academic progress after a set unit of material (i.e., assessment of learning) and formative assessments, designed to monitor student progress during the learning process and provide feedback (i.e., assessment for learning) (Chappuis & Stiggins, 2002). However, this distinction has not been a primary focus of recent tertiary mathematics studies. Contrary to the assumption that formative and summative assessment approaches are incompatible, a recent article demonstrated how these assessment forms could be combined in university mathematics teacher education (Buchholtz et al., 2018). Thus, the central consideration in mathematics education at a tertiary level is about what type of reasoning is elicited and assessed by various tasks. Specifically, in line with the distinction between procedural and conceptual knowledge, an important question is how to design valid and reliable methods to assess different types of understanding (e.g., instrumental vs. relational in the sense of Skemp (1976)). Research is conclusive that the way in which procedural knowledge is assessed has become relatively standardised. In contrast, the supposedly designed tasks to assess conceptual knowledge do not always align with theoretical claims about mathematical understanding (Crooks & Alibali, 2014). Moreover, an extensive overview of undergraduate mathematics assessment revealed that most questions on exams and coursework could be categorised as “imitative” — questions that can be solved by performing prescribed algorithms and recalling analogous (if not identical) solutions (Bergqvist, 2007; Iannone et al., 2020; Iannone & Simpson, 2011). Researchers pointed out that this issue is amplified in service mathematics courses designed for non-mathematics majors (Mac an Bhaird et al., 2017). A comparable service course serves as a setting for the present study, in which we investigate the feasibility of concept mapping as an assessment tool to diversify measuring capacity and as a learning strategy to foster relational reasoning.

1.2 Self-efficacy

Self-efficacy is a commonly used psychometric construct introduced by Bandura (1977) as part of his social cognitive theory. Self-efficacy is defined as a set of one’s beliefs about their capability to perform and complete a particular task (Bandura, 1997). The construct has long been considered to play an essential role in achievement motivation, and much past research has shown that self-efficacy is a significant predictor of academic performance (Pajares & Graham, 1999; Zimmerman, 2000). Since the 1970s, a large body of research based on Bandura’s conception of self-efficacy has been developed, making this construct one of the most studied and utilised in educational psychology research.

Mathematics-specific self-efficacy has been given special consideration in mathematics education research and is generally viewed as a belief about one’s capacity for doing mathematics (Pajares & Miller, 1995). Methods for measuring it have been developed and used to demonstrate that students with higher self-efficacy tend to show greater interest, effort, persistence, help-seeking behaviour, and, ultimately, greater mathematics achievement than those who feel less efficacious (Pajares & Graham, 1999; Pajares & Kranzler, 1995; Pajares & Miller, 1995; Schukajlow et al., 2019; Skaalvik et al., 2015; Williiams & Williams, 2010).

1.2.1 Types of self-efficacy

Some researchers make a distinction between academic self-efficacy, which is defined as “general perceptions of academic capability” (Richardson et al., 2012, p. 356), and performance self-efficacy, which is defined as “perceptions of academic performance capability” (Richardson et al., 2012, p. 356). When students face a situation, the type of self-efficacy which influences the student depends on whether the situation is familiar. If the situation is familiar, the student will use previous experience to predict their performance based on their performance self-efficacy. However, if the situation is unfamiliar, students tend to rely on the more generalised competency beliefs to predict their performance, which is known as academic self-efficacy (Zimmerman et al., 1992). Specifically for mathematical problem-solving in a high school classroom, Pajares and Kranzler (1995) demonstrated that performance self-efficacy has a strong direct effect on mathematics anxiety and performance even when controlling for the general mental ability.

More recent research sought to develop our understanding of assessment-related self-efficacy with a focus on both students’ beliefs about content-specific tasks and their beliefs around assessment-taking, such as their beliefs about whether or not they are good at taking tests. For example, it is not uncommon to hear people lament that they are “not good at taking a test” or “I know I can do it, just not on an exam”. To incorporate this more nuanced perspective into the construct of self-efficacy, researchers developed a new instrument, the Measure of Assessment Self-Efficacy (MASE), designed to assess beliefs related to assessment preparation and performance, which is most relevant to research undertaken in a natural educational setting such as a university course (Riegel et al., 2022).

1.2.2 Sources of self-efficacy

There are ways to improve one’s self-efficacy. Bandura (1997) suggested four sources of self-efficacy: (1) mastery experience, (2) vicarious experience, (3) social persuasion, and (4) emotional and psychological states. Mastery experience, which can be thought of as former attainment, was reported to have the most substantial influence (Usher & Pajares, 2008). This source is especially powerful when the experience involves overcoming a challenge.

1.2.3 Self-efficacy and concept mapping

Only a few studies have investigated the effect of concept mapping on self-efficacy. For example, a study by Chularut and Debacker (2004) used a randomised pretest-posttest control group design to examine the impact of concept mapping on students learning English as a second language in the USA. The researchers found that the students in the concept mapping group showed a slightly greater self-efficacy increase than those in the alternative learning strategy group. Similar results were reported by Khajavi (2012), who undertook a study involving students enrolled in an English reading comprehension course in Iran, demonstrating a substantial effect size of the concept mapping intervention. Analysing the effect on self-efficacy from a different angle, Gurlitt and Renkl (2010) showed that different concept mapping activities (label-provided-lines vs. create-and-label-lines) resulted in significantly different self-efficacy measures in a study of German psychology students.

In summary, although it is naturally anticipated that concept mapping could positively impact self-efficacy, this topic has rarely been studied, and no such studies were undertaken in mathematics education contexts. From the perspective of mathematics learning theories, the type of reasoning elicited by concept mapping is conducive to forming a relational/conceptual understanding of mathematics, which could enhance self-efficacy as a mastery experience source. Furthermore, self-efficacy could be impacted through the mechanism of metacognition: regular concept mapping tasks prompt students to be critically aware of their thinking and learning and reflect on the depth of their understanding. This is likely to impact their perceptions of themselves as mathematics thinkers and learners, thereby affecting their self-efficacy (Coutinho, 2008).

1.3 Research questions

The present study had two broad goals. The first is to report on the design, development, and deployment of a new type of assessment in a large mathematics course at the university level. Students were tasked to construct a concept map weekly either as part of a problem-set during their tutorials (practical problem-solving sessions) or as a marked assignment. The second goal was to evaluate this initiative using student coursework data and their responses to a questionnaire (administered at the end of the semester) by conducting statistical analyses to determine and compare the effects of various assessment forms on student performance and self-efficacy. Specifically, we sought to answer the following research questions:

  • RQ1. Does student concept mapping performance explain a statistically significant amount of variance in the final exam scores after accounting for other coursework assessments?

  • RQ2. Does student concept mapping performance explain a statistically significant amount of variance in their self-efficacy (assessment-related) after accounting for other coursework assessments?

2 Methods

2.1 Research site

The research site of this study was a large research-intensive university (University of Auckland, New Zealand). The study was implemented in a stage-II undergraduate mathematics course that serves the needs of students majoring in various disciplines such as finance, economics, chemistry, physics, computer science, statistics, etc. The contents covered in this course are Calculus II, Linear Algebra II, and Introduction to Ordinary Differential Equations. As for most courses in this university, this course is delivered over 12 teaching weeks with the repeated structure of three 1-hour lectures and one 1-hour tutorial (25 to 30 students per room working on problems) each week. The lectures include active learning activities such as interactive whole-class questioning/discussions and think-pair-share tasks.

2.2 Participants and study description

In the trial semester, out of the 355 students enrolled in the course, 35 students studied overseas, completing the course online; the remaining 320 studied locally. Tutorials are an essential component of this course and have been implemented in an active learning format for over a decade. During tutorials, students are encouraged to interact and solve various problems in small groups. All students are expected to attend a tutorial each week, starting from the 2nd week of the semester. In addition to this work, students must submit their solutions to a weekly “marked problem”, which are traditional written assignments except for the two that were concept mapping tasks. Hence, study participants were expected to construct a concept map weekly, either as part of the tutorial question set or when assigned as a marked problem.

Specifically, two marked problems were assigned, asking students to construct concept maps on the following topics: in Week 3 on Series and Week 7 on Vector Spaces. A template, called a Knowledge Organiser, was provided to the students as a guide. Prior to creating a concept map, students were prompted to state the definition of a given concept and provide at least two examples and at least one non-example. Students were allowed a week to complete the task. Out of ten concept maps that students completed during the course, only the two assigned as “marked problems” were collected for marking.

In the first lecture of the course, the two authors of this study provided a short introductory session on concept mapping. In addition, an example of a concept map on Linear Systems (a prerequisite topic for the course) was shared with the students online so that students could refer to it anytime. However, after collecting student submissions for marking in Week 3, the quality of the concept maps indicated that some students did not fully understand the instructions for this task or found it very difficult to construct a concept map. Hence, the first author, who is a lecturer of the course, gave another explanation during a lecture to demonstrate how to construct a high-quality concept map.

A model solution was provided to the students, noting that there can be many variations of a high-quality concept map (Fig. 3). In order to assess students’ work on concept maps, a rubric was developed and validated by the researchers, which was reported in a separate study (Jeong & Evans, 2021). The validation methodology was based on comparing four rubrics, developed according to previous research, by examining their reliability and validity. Rubric 1 was based on a qualitative assessment of intrinsic qualities of a concept map (such as the type of concept map: spoke, chain, net (Kinchin et al., 2000)), combined with perceived effort on a discrete 4-point scale. The other three rubrics were analytical, counting the number of concepts (Rubric 2), relationships (Rubric 3) and the (inverse) ratio between them (Rubric 4). Regression modelling was used to compare the extent to which the four rubrics explained the variance in the scores on a concept map question on the final exam. (The final exam included a question on concept mapping — see Figs. 4 and 5). Two rubrics were identified as most valid: Rubrics 1 and 4. However, taking into account reliability considerations, Rubric 4 was selected as the most suitable, ensuring objectivity in assigning scores: The ratio method outputs a numerical value equal to the inverse ratio of the number of concepts used in the concept map to the number of relationships identified. For example, the score for the concept map in Fig. 3 is obtained by calculating the inverse ratio of the number of concepts (= 25) and the number of relationships between them (= 30). Hence the score is 30/25 = 1.2. According to this method, for a fixed number of concepts, the higher scores indicate high-level elaboration, with more related concepts connected vertically and horizontally in a net-type formation. On the other hand, fewer concepts used limit the upper bound of the inverse ratio since the maximum number of connections for a graph with \(n\) nodes is \(\frac{n(n-1)}{2}\) (in the case of a complete graph). Hence, the maximum inverse ratio for a graph with \(n\) nodes is \(\frac{n-1}{2}\). An extreme example is a concept map with only two concepts connected by one relation with a score of ½ = 0.5. (More information can be found in Jeong & Evans, 2021).

Fig. 3
figure 3

Concept map on Series

Fig. 4
figure 4

Concept map question in the final exam

Fig. 5
figure 5

In-line choice options for A, B, and C for identifying missing elements of the concept map in Fig. 4

2.3 Exam question on concept mapping

The final exam comprised 30 multi-choice questions, with one question on concept mapping organised as three in-line choice options worth three marks (10% of the final exam). The overarching idea of this question was to assess students’ consolidated macro view of the concept of Vector Space as a mathematical structure used to generalise physical spaces such as \(\mathbb{R}\), \({\mathbb{R}}^{2}\), \({\mathbb{R}}^{3}\), and then \({\mathbb{R}}^{n}\); to define fundamental subspaces related to matrices (such as Nullspace, Column space, Eigenspace) and to be utilised in solving differential equations. The concept map question was presented as a completed concept map with a few missing concepts and relationships for students to identify (Fig. 4).

Students were presented with three in-line choice questions with multi-choice options to replace objects A, B, and C in the concept map. The multi-choice options given for each of A, B, and C are shown in Fig. 5, with the correct answers at the bottom of the lists. In addition, considering that it may be the first time students would have encountered such a question in an examination, a mock exam containing a similar question was provided at the end of the semester.

3 Data collection and availability statement

The data was collected from the Learning Management System (Canvas) and a questionnaire administered at the end of the semester through Qualtrics. All students (N = 355) were invited to participate in the study but not all consented. The final dataset contained 219 participants. The raw dataset is available at https://doi.org/10.17608/k6.auckland.19618014.

3.1 Coursework marks

Coursework marks considered for this study included the following:

  • 1 final exam (50%).

  • 1 mid-semester test (20%).

  • 30 quizzes (15%).

  • 10 marked problems (10%).

  • 10 tutorial participation marks (5%).

The 30 quizzes were short multi-choice assessments conducted online, assessing basic comprehension of material introduced in a previous lecture, starting from Week 2 (Evans et al., 2021). The lowest four marks were disregarded, with the sum of the top 26 quiz marks counted towards the final course mark.

The ten marked problems were short written assignments due at the end of each week, submitted for marking and feedback. As explained above, two of the marked problems were on concept maps, one on Series and the other on Vector Spaces. The ratio method (described above) was used to assign scores to students’ concept maps.

The exam comprised 30 multi-choice questions, with more than a third (12 points out of 30) attempting to assess conceptual understanding (including the question on filling in the blanks in a concept map described in 2.3). Exam questions are available at https://doi.org/10.17608/k6.auckland.19618014.

3.2 Self-efficacy measures

In order to measure students’ self-efficacy in this study, a tailor-made instrument measuring tutorial-related self-efficacy was used: the Measure of Assessment Self-Efficacy for Mathematics Tutorials (MASE-T), which was based on the general assessment self-efficacy instrument, MASE (Riegel et al., 2022). The instrument explicitly measures assessment-related self-efficacy and comprises two factors: (1) Comprehension and Execution and (2) Emotional Regulation. The MASE items assess participants’ beliefs in their ability to understand, perform, and emotionally regulate while studying for and during various types of course assessment. Instruments to measure exam-related self-efficacy (MASE-E) and quiz-related self-efficacy (MASE-Q) were developed and validated (Riegel et al., 2021, 2022), providing general guidelines for adapting the instrument to any other type of assessment. In this study, the guidelines were used for tutorials since they involved mathematical problem-solving without the stress associated with high-stakes assessments such as tests and exams. Students were prompted with the following tutorial scenario:

Please envision yourself in the scenario: you are enrolled in a mathematics course that has a weekly tutorial worth 0.5% of your final grade. The tutorial contains four questions on a topic you studied in your course. In considering taking part in this tutorial, please rate the extent to which you agree or disagree with the following statements. (The items comprising the scales are listed in Table 1).

Responses to statements were measured using a slider scale from 1 to 100 (where 1 = Cannot do at all, 50 = Moderately sure can do, and 100 = Highly certain can do). Based on confirmatory factor analysis of students’ responses collected in week twelve of the semester, a two-factor model was confirmed, which was consistent with the general structure of MASE; the model offered an acceptable fit (χ2/df = 2.203, CFI = 0.986, SRMR = 0.0238). The internal consistency of the scales was confirmed using Cronbach’s alphas: α = 0.917 for the Comprehension and Execution factor and α = 0.910 for the Emotional Regulation factor, with values higher than 0.7 reflecting the reliability of the scales.

Table 1 Tutorial-related self-efficacy measure (MASE-T)

4 Results

In addressing the study’s research questions, hierarchical multiple regression analyses were undertaken. Table 2 summarises the descriptive statistics and correlations of the variables included in the analyses.

Table 2 Descriptive statistics and correlations of factors in the hierarchical multiple regressions

4.1 Concept map score as a predictor of the final exam score (RQ1)

A hierarchical multiple regression analysis was undertaken to investigate the ability of concept map scores to explain the variance in the final exam scores after controlling for other more traditional assessment components. Preliminary analyses were conducted to ensure no violation of the assumptions of normality, linearity, and homoscedasticity. Additionally, the correlations among the predictor variables included in the study were examined, presented in Table 2. Quiz scores were not included in the analysis due to high correlations with both tutorial marks and marked problems. Most correlations were weak to moderate, ranging between r = .17, p < .01 and r = .43, p < .001 with one exception: borderline high correlation between tutorial scores and marked problems (r = .70, p < .001). Moreover, the addition of the tutorial scores was found to reduce the adjusted R2, thus justifying its removal as a predictor in the final model (see Table 3).

Table 3 Summary of Model 1

In the first step of hierarchical multiple regression, the test score was entered as the most likely predictor of the exam score. This model was statistically significant F(1, 217) = 37.49; p < .001 and explained 14.7% of the variance in the final exam scores (adjusted R2 = 0.143). The addition of marked problem scores in Step 2 resulted in a statistically significant increase in adjusted R2 of 0.050, F(1, 216) = 14.43, p < .001. In the final model (Step 3), after controlling for the test and marked problem scores, the addition of the concept map scores resulted in a statistically significant increase in adjusted R2 of 0.025, F(1, 215) = 4.89, p = .028. All three assessment types made a significant unique contribution to the final model (Table 3), with the concept map score recording a standardised Beta value of 0.14, p < .05, indicating that after accounting for the effects of all other variables, with every increase by one standard deviation in the concept mapping score, the final exam score increases by 0.14 standard deviation. In summary, the concept map scores significantly increased the proportion of explained variance in the final exam score above and beyond what was accounted for by the test and marked problem scores alone.

4.2 Concept map score as a predictor of the tutorial-related self-efficacy (RQ2)

Hierarchical multiple regression analyses (Model 2a and Model 2b) were conducted to ascertain whether or not concept mapping tasks make an independent contribution to explaining the variance in the Tutorial-Related Self-Efficacy construct, which comprises two factors: Comprehension and Execution (CE) and Emotional Regulation (ER). Initially, we used all coursework assessment components as predictors, but these initial models were not feasible due to a multicollinearity problem between quizzes and marked problems, and also between quizzes and tutorial marks, indicated by high correlations (which were greater than 0.7). Therefore, a decision was made to remove the quiz scores from the model. After removing quizzes, none of the predictors had pairwise correlations higher than 0.7, and all other model assumptions were satisfied.

4.2.1 Model 2a (comprehension and execution factor of MASE-T)

The summary of Model 2a is presented in Table 4, in which the full model (Step 4) with the test, marked problem, tutorial and concept map scores predicting the Comprehension and Execution factor of tutorial self-efficacy (CE of MASE-T) was statistically significant, R2 = 0.198, F(4, 214) = 13.21, p < .001; adjusted R2 = 0.183. The first model (Step 1), where only the test score predicts CE of MASE-T, was statistically significant with adjusted R2 = 0.158, F(1, 217) = 41.98, p < .001. The addition of marked problem scores resulted in a significant increase of adjusted R2 of 0.030, F(1, 216) = 9.04, p < .01. However, the addition of tutorial marks and concept maps scores resulted in a reduction of adjusted R2. The only significant predictor in the final model was the test score.

Table 4 Summary of Model 2a

4.2.2 Model 2b (emotional regulation factor of MASE-T)

The summary of Model 2b, explaining the variance in the Emotional Regulation factor of tutorial self-efficacy (ER of MASE-T), is given in Table 5. The full model (Step 4) with the test, marked problem, tutorial and concept map scores predicting ER of MASE-T was statistically significant, R2 = 0.169, F(4, 214) = 10.89, p < .001; adjusted R2 = 0.154. The first model (Step 1), where the only predictor is the test score, is statistically significant with adjusted R2 = 0.141, F(1, 217) = 36.69, p < .001. In Step 2, the addition of marked problem scores resulted in a non-significant increase of adjusted R2 of 0.003, F(1, 216) = 1.76, p = .19. The addition of tutorial participation marks in Step 3 resulted in a small reduction in the adjusted R2. However, the addition of concept map scores in the final Step 4 resulted in a statistically significant increase in adjusted R2 of 0.014, F(1, 214) = 4.40, p = .037. This indicates that the concept map score explains a statistically significant amount of variance in ER of MASE-T over and above what was explained by all other assessment components. Importantly, only the test and concept map scores have a significant effect on ER of MASE-T in the final model, with the standardised Beta values of β = 0.33, p < .001 and β = 0.14, p = .037, respectively. This means that, out of all low-stakes assessments, only the concept mapping regression coefficient is statistically significantly different to 0, inferring that there is a positive linear relationship with the ER of MASE-T after accounting for the effect of all other variables.

Table 5 Summary of Model 2b

In summary, student concept mapping performance explains a statistically significant amount of variance in the Emotional Regulation factor of self-efficacy after accounting for other coursework assessments, which is an unexpected and somewhat surprising result.

5 Discussion

One of the major aims of this study was to test whether student concept mapping performance explains a statistically significant amount of variance in the final exam scores after accounting for other coursework assessments (RQ1). Hierarchical multiple regression was used to ascertain this, which is nested modelling with a way to control for an individual contribution of a variable to explain variance in the dependent variable above and beyond what was accounted for by other variables and to test the significance of the improvement. First, all three assessment types considered (Test (β = 0.26), Marked problems (β = 0.22), and Concept maps (β = 0.14)) made significant unique contributions to the final model (Table 3). Second, the addition of concept map scores in the final step of the nested modelling has significantly improved the model’s ability to explain variance in the final exam scores above and beyond what was accounted for by the test and marked problem scores alone. This finding supports the hypothesis that concept mapping is a principally different type of assessment, detecting learners’ capabilities that are not discerned by regular assessments. Taken together with the latest mathematics cognition perspective, which assumes that conceptual understanding is developed when a sufficiently well-organised schema has been encoded into long-term memory, it is plausible to suggest that concept mapping (as an externalisation of a schema) may have the potential to assess conceptual understanding. This would be useful since the limitations of conventional assessments in evaluating conceptual understanding are widely accepted. Many attempts to design special instruments to assess conceptual understanding have been made. For example, a Calculus Concept Inventory (CCI) was developed by Epstein (2007), which has been widely used in research and practice for over a decade (Epstein, 2013). However, serious criticism has recently been levelled at the validity of the CCI measure (Gleason et al., 2019). Moreover, in a study involving foundation mathematics students in the UK, the low internal consistency for both the subset and full set of CCI items suggested that the instrument does not measure a single construct (Bisson et al., 2016), thus demonstrating that accurate evaluation of conceptual understanding via an instrument remains a largely elusive goal.

From the theoretical perspective, this is also a plausible hypothesis. The construction of a high-quality concept map involves explicating not only generality/specificity relations but also cross-links — the connections between concepts derived from different strands developed from the central concept (Fig. 1 C). An ability to construct such a map is likely to correspond to a “relational understanding” in the sense of Skemp (1976), reflecting a superior conceptual understanding.

However, a major limitation of this study is the lack of robust evidence to make generalisable claims about the utility of concept mapping as an assessment tool for conceptual understanding. Future research could focus on a detailed analysis of qualitative features of concept mapping that differentiate it from conventional assessments and by comparing it with the existing valid methods for assessing conceptual understanding, such as a comparative judgement (CJ) approach, which was successfully used for assessing conceptual understanding of some mathematical topics: derivatives, p-values and use of letters in algebra (Bisson et al., 2016, 2020; Jones et al., 2019).

The other aim of this study was to investigate whether student concept mapping performance explains a statistically significant amount of variance in their self-efficacy after accounting for other coursework assessments (RQ2). Hierarchical multiple regression was used to compare two models: Model 2a (predicting the Comprehension and Execution factor of tutorial self-efficacy) and Model 2b (predicting the Emotional Regulation factor of tutorial self-efficacy). Both models were significant, albeit with only a small percentage of self-efficacy variance explained as indicated by the values of R2adj in the range of 0.154-0.183. However, one noteworthy difference between the models is that the change in R2adj in the final step (adding concept mapping) is only significant in Model 2b. This indicates that concept mapping performance explains a statistically significant amount of variance in the Emotional Regulation factor of self-efficacy after accounting for other coursework assessments. This finding was somewhat unexpected. It is known that the main source of self-efficacy is mastery experience, which is especially powerful when the experience involves succeeding on challenging tasks (Bandura, 1997; Usher & Pajares, 2008). To that end, it was reasonable to expect that concept mapping performance would contribute to explaining the Comprehension and Execution factor of self-efficacy because of the nature of the concept mapping activity. The type of reasoning elicited by concept mapping is conducive to forming a relational/conceptual understanding of mathematics, which could lead to enhanced appraisals of comprehension. Moreover, the comprehension marker of self-efficacy could have been impacted through metacognition, as explained in Section 1.2.3: concept mapping tasks prompt students to be critically aware of their thinking and learning and to reflect on the depth of their comprehension. However, our result did not confirm this hypothesis. Instead, the regression modelling pointed to the importance of investigating emotional regulation as part of the cognitive processes involved in learning with concept mapping.

The role of emotions and beliefs in task completion has been recently identified as an important area of research on mathematics-related affect (Hannula, 2002; Hannula et al., 2016; Schindler & Bakker, 2020; Zan et al., 2006). From a broader perspective, an extensive body of research has demonstrated that students’ emotions profoundly affect their academic engagement and performance, identifying a particular impact of positive and negative moods on task completion (Pekrun & Linnenbrink-Garcia, 2012). Given these considerations and our findings, we can hypothesise that successful concept mapping activity could enhance learners’ emotional regulation appraisals due to the distinct characteristics of the activity. It could be that concept mapping is superior to other conventional types of assessment through its influence on the manifestation of positive emotions experienced by a learner when a comprehensive concept map is completed. The self-reflection on learning that happens at that moment could be more profound than, say, after completing a multi-choice test or submitting an assignment with problems mimicking worked examples provided. Through the repeated experience of regularly constructing high-quality concept maps during the course of the semester, the positive emotions induced by the learning activity could accumulate to substantial improvement in the learner’s affect.

However, our results are limited in that they do not provide any evidence about the causality of the relationship. It could be that the direction of the relationship is the other way around. Learners with better emotional regulation produce higher quality concept maps. Given our results, we can only conclude that there is a positive linear relationship between the Emotional Regulation factor of self-efficacy and concept mapping performance after accounting for the effect of other assessments. Still, we can not ascertain that the learning activity has caused an improvement in the Emotional Regulation factor measure. The measure consists of items pertaining to the suppression of negative emotions and recording the extent to which students agree with: “Even when I struggle while studying, I am able to stay positive about my ability to succeed”. A likely explanation is that learners who believe that when they struggle during a learning activity and can remain positive and succeed are more likely to persevere in the face of a challenge and not give up.

An important finding from our analysis is that among the low-stakes course assessments, only concept maps are positively associated with the Emotional Regulation factor of self-efficacy, controlling for other variables. In other words, no linear relationship with the conventional coursework assignments (weekly marked problems) was identified. There is no difference in the Emotional Regulation measure between those students who do well and those who do not on a typical homework assignment, controlling for other assessments. This contrasts with the concept mapping indicators, which, perhaps, points from yet another perspective that concept mapping is a principally different learning activity compared to a typical mathematics task. The theoretical perspective, outlined in Section 1.1.1, offers justification for this conclusion because success in a concept mapping task requires perseverance and sustained effort in deliberate meaning-making through deep engagement with the material by focusing on the organisational structure of a set of related concepts to produce elaborative connections among them as part of externalisation and reformation of a schema. Such effortful meaning-making could be a missing facet of many traditional mathematical tasks often assigned to learners.

6 Final remarks

Much has been said about the need to seek research-grounded solutions to improve practice. A particular focus has been placed on classroom-based interventionsFootnote 1, which are rarely undertaken and evaluated in mathematics education (Stylianides & Stylianides, 2013). To that end, one of the goals of our study was to report on the design, development, implementation, and evaluation of a novel type of assessment in a university mathematics course. The design principle of our intervention is generalisable and transferable to other educational domains as a blueprint for an assessment structure and related instruction that could be utilised in mathematics education broadly. Stylianides and Stylianides (2013) propose three dimensions of evaluation of classroom interventions: (1) how amenable it is for scaling up, (2) how practicable it is for curricular integration, and (3) how capable it is of producing long-lasting effects. Evaluated this way, our intervention can arguably be deemed effective for the first two criteria: the number of students utilising concept mapping is unlimited; concept maps can be easily integrated into the existing curricula structures in a practicable way. Determining long-lasting effects is a more complex challenge, and ongoing research is needed. Future research could build on evidence presented in this study to investigate effectiveness of concept mapping as an assessment and a learning tool in different and/or properly controlled settings.