1 Understanding the concept of integral

In many countries, integral calculus is established as a key component of mathematical education at school during the higher stages of secondary school and in calculus courses at university. There are various aspects to the concept of integral, each emphasizing different facets. For example, the definite integral—to which we restrict our attention here—is often interpreted as the area under a graph or the total variation of a dynamic process (Kouropatov and Dreyfus 2013). For university students, Jones (2013) identified different facets or conceptual schemas of the integral: the ideas of ‘adding up pieces’ and ‘infinite addition’ involve thinking similar to that involved in a Riemann sum; the ‘perimeter and area form’ involves the conceptualization of the definite integral as the area of a fixed region, and the ‘function matching form’ is closely linked to the antiderivative process. The main thesis of this paper by Jones is “that student difficulties might not necessarily arise from lack of knowledge, but from the activation of less-productive cognitive resources over others” (p. 138). However, empirical studies also have repeatedly shown that students at school and at university experience fundamental difficulties in understanding the concept of integral (cf. for example Orton 1983; Thomas and Hong 1996; Baumert et al. 1999; Serhan 2015; Huang 2012; Jones 2018; Rösken and Rolka 2007; Hashemi et al. 2014). “In summary, the research on definite integrals found that student knowledge was limited to procedural knowledge since they were good at computing the integral but had difficulty (…) connecting the different representations of the definite integral” (Serhan 2015, p. 85).

Empirical studies have offered some insight into the possible causes. Jones (2018) considered the teaching approaches used in the classroom—although only seven teachers were involved in the study. “The results suggest that students may be inducted into the usage of these kinds of ‘best representational fit’ images from their calculus instructors” (Jones 2018, p. 914), since five of the seven teachers represented the definite integral in precisely this manner, described by the author as prototypical. Other studies have shown that conventional mathematics lessons on integral calculus develop only a very limited conceptual understanding of the notion (e.g. Thomas and Hong 1996; Belova 2006). Moreover, mathematics instruction should take heed of the fact that an extended understanding of the integral must be expected in the transition from school to university, since the two settings have different priorities (Orton 1983).

To sum up, we can say that empirical studies show that university students develop advanced skills within the framework of procedural knowledge while working with integrals on a symbolic level. However, they have difficulties concerning conceptual knowledge, knowing different aspects or facets of the integral, interpreting different representations, and making connections between these representations, especially to the symbolic representation (see Hiebert 1986). Therefore, to achieve a substantial and sufficient understanding of the concept of integral, it is necessary to have a wider view of the concept, to see different aspects in relation to different applications, and to connect it to different representations. We consider that developing rich and sustainable basic mental models is important, even essential, and indispensable for this wider conceptual understanding of the integral.

2 Basic mental models

2.1 Basic mental models and mathematical aspects of mathematical concepts

In German-language pedagogy and didactics of mathematics, there have been theoretical considerations for over 200 years on what ideas learners should develop about mathematical concepts, for example by Pestalozzi, Herbart, Kühnel, Breidenbach, Oehl, and Griesel (cf. vom Hofe 1995, 1996; vom Hofe et al. 2005; vom Hofe and Blum 2016). A key term in this discussion is the German word ‘Grundvorstellung’. It consists of two components. The first part, ‘Grund’, means ‘basis’ and the second part, ‘Vorstellung’, means ‘idea’, ‘notion’, or ‘mental model’. Thus, ‘Grundvorstellung’ can be translated as ‘Basic Mental Model’, which we abbreviate to BMM. To illustrate this term, we use the definite integral as an example.

A basic mental model (‘Grundvorstellung’) of a mathematical concept is a content-related interpretation that gives meaning to this concept (vom Hofe et al. 2005). In Sect. 3, we present four BMMs for definite integrals: an integral can be interpreted as the oriented value of an area, as the total variation of a quantity, as the accumulation of a large number of small pieces, and in terms of an average value. BMMs are prerequisites for dealing with a mathematical concept in an insightful way; they “capture the substance of the mathematical content” (Hefendehl-Hebeker et al. 2019, p. 31).

By contrast, mathematical definitions of definite integrals do not refer to such interpretations but are instead based on mathematical aspects of the concept. A definite integral can be defined as the supremum of lower sums and the infimum of upper sums. For continuous functions, it can be specified as a limit value of Riemann sums. Such mathematical aspects of a concept are subject-related parts of the concept that can be used to define or characterize it from a mathematical point of view (Weigand et al. 2017). The relationship between mathematical aspects and BMMs is described in Sect. 2.3.

2.2 Normative and individual basic mental models

In didactics of mathematics, the term BMM is used for both normative and descriptive work (vom Hofe et al. 2005; Hefendehl-Hebeker et al. 2019). Normative refers to which BMMs students should ideally have, and descriptive refers to which BMMs they actually have in practice.

  • Normative BMMs are interpretations of a mathematical concept that learners should generally and ideally develop. These BMMs are identified by didactic analyses of the mathematical concept (see Sect. 3 for BMMs of definite integrals). They can be used as educational guidelines and to specify learning objectives for mathematics lessons. This can provide orientation to teachers when designing and organizing their lessons.

  • Individual BMMs are the specific manifestations of normative BMMs in a person. They result from individual learning processes, and the extent to which they deviate from normative BMMs can vary. By observing students as they do mathematical work and analysing their oral or written expressions, one can try to draw conclusions about their individual BMMs. In mathematics lessons, this can be a starting point for teaching and support measures with the aim of developing individual BMMs as appropriately as possible with regard to normative BMMs.

In this paper, we describe normative BMMs of the concept of a definite integral and develop a test to verify the existence of these (normative) BMMs in the thinking of students while solving problems. We do not investigate the characteristics or coincidences of the individual BMMs.

2.3 Relation to “concept image—concept definition”

The concept of BMMs is related to the widely known and established theoretical concept of “concept image—concept definition”. The latter has been established in the didactics of mathematics since at least the 1980s to distinguish between the formal aspects of a concept and the conceptual ideas associated with it. Tall and Vinner (1981) described the concept image as “the total cognitive structure that is associated with the concept, which includes all the mental pictures and associated properties and processes. It is built up over the years through experiences of all kinds, changing as the individual meets new stimuli and matures” (p. 152). By contrast, they regarded a concept definition “to be a form of words used to specify that concept” (ibid. p. 152). They further differentiate facets of the term “concept definition”: a formal concept definition is formulated in a mathematically precise manner, as is common in mathematics books, for example. In contrast, a self-formulated definition developed by an individual is called a personal concept definition. It may of course coincide with a formal concept definition, but it does not have to.

The relationship between concept images and concept definitions has been studied for various mathematical concepts (e.g. functions, limits), particularly with regard to the close connection between teaching and students’ developed concepts (Bingolbali and Monaghan 2008). One problem that was repeatedly revealed is that a concept definition may be associated with a very limited concept image. Moreover, there is the risk for learners that a concept image that is too narrowly restricted to specific cases may lead to incorrect conclusions about a concept definition and thus about general principles (Vinner 2011).

Figure 1 shows a model that illustrates relationships between BMMs, mathematical aspects, concept definitions, and concept images. We explain it starting from the left and going clockwise, while referring to the mathematical concept of the definite integral as an example.

Fig. 1
figure 1

Relations between BMMs, mathematical aspects, concept definitions, and concept images

  • A mathematical aspect can provide the content of a formal concept definition. Conversely, a formal concept definition uses a mathematical aspect to characterize a mathematical concept. For example, a formal concept definition of the definite integral may be based on the mathematical aspect of the supremum of lower sums and infimum of upper sums.

  • Formal concept definitions are a subject-related basis for teaching and learning processes (e. g. in school) that develop personal concept definitions. After successful learning, the latter should correspond to the former, at least with regard to their subject-related core. For example, students should be able to formulate a personal concept definition of the definite integral in which basic ideas about lower and upper sums are adequately expressed.

  • Personal concept definitions are expressions of components of the individual concept image. The latter gives meaning to a personal concept definition for the specific individual. For example, a person’s concept image of definite integrals is the entire cognitive structure that is associated with this mathematical concept. It is the basis for formulating personal concept definitions.

  • A person’s concept image may be comprised of several individual BMMs of the respective mathematical concept. These BMMs are components of the concept image. For example, the individual concept image of a person for definite integrals might encompass the BMMs of area and reconstruction (see Sect. 3).

  • Individual BMMs are manifestations of normative BMMs in a person. The latter form a didactic basis for teaching and learning processes that develop individual BMMs. For example, a teacher should have didactic knowledge about normative BMMs of the definite integral in order to design mathematics lessons in such a way that students develop individual BMMs adequately.

  • Normative BMMs are interpretations of mathematical aspects of a concept and give them meaning that is independent of specific individuals. Conversely, mathematical aspects of a concept provide a basis for identifying normative BMMs through subject-related didactic analyses. For example, in Sect. 3, we present the results of one such analysis of the concept of definite integral.

The terminology presented in Fig. 1 can be assigned to two areas, namely, mathematics and didactics on the one hand and the cognitive structures of individuals on the other. The figure also shows relationships between these two areas. Mathematical aspects and formal concept definitions belong to the area of mathematics. They are the subject-related basis of mathematical education. Normative BMMs provide normative answers to the question of what students should imagine about mathematical concepts from a didactic perspective. In contrast, concept images, personal concept definitions, and individual BMMs all relate to specific individuals. These terms can therefore be used on a descriptive level when trying to determine and describe the cognitive structures that a specific person has about a mathematical concept. The test instrument presented in Sect. 4 operates at this level.

3 Normative basic mental models and approaches to the concept of an integral

To reduce the problems identified with learning integral calculus and promote a sound understanding of the concept of integral, we focus on describing normative BMMs of the definite integral that are useful for developing this concept. As a starting point, we use the concept of the definite integral over a closed interval for a limited function known from school mathematics. The structure of a model with four BMMs of the integral is explained below. This structure has already been described in detail (Greefrath et al. 2016) and fits the typical approaches to integral calculus. However, a differently accentuated viewpoint can also lead to differently accentuated BMMs. Two of the described BMMs are typically used to approach the concept of integral in German school textbooks.

3.1 The basic mental model of area (AR)

The BMM of area interprets the definite integral of a function f as the oriented value of the area enclosed by the graph of the function f on the interval \([a;b]\) and the x-axis in the Cartesian plane. (To define the oriented area, regions above the x-axis are taken to be positive, and regions below the x-axis are taken to be negative.) When interpreting the definite integral as an area, the concept of integral is linked to experiences from everyday life, since “the act of measuring area has always clearly had elements of everyday life” (Bender 1991, p. 51).

One classical approach to the concept of integral focuses on determining the area under function graphs and developing the BMM of area. A typical introductory example is to calculate the area under the graph of a function f (often \(f\left(x\right)={x}^{2}\) on the interval \([\mathrm{0,1}]\)). To do this, the interval is divided into multiple subintervals, typically chosen to be of equal length. By increasing the number of subintervals, the process becomes more accurate for integrable functions, and the difference between the upper and lower sum decreases. Defining the integral as the (same) limit of the upper and lower sums then seems natural.

When determining the area under a function graph, the BMM of area is emphasized, although the BMM of accumulation (see below) is also addressed by approximating the area with rectangular strips. The BMM of area results from the special case \(f(x)\ge 0\), which can, however, lead to an inappropriate generalization (Bezuidenhout and Olivier 2000, p. 78). There is the widespread misconception mentioned above that “the integral is an area, and area is always positive” (Kouropatov and Dreyfus 2013, p. 643), which was clearly demonstrated by the TIMSS study (cf. Baumert et al. 1999, p. 80), for example (Jones 2013, p. 138; Sealey 2006, p. 52). It may be the case that students need more time and a deeper understanding if they are to apply this BMM appropriately and link the aspects of ‘area under a curve’ and the Riemann sum (Engelke and Sealey 2009, p. 4).

3.2 The basic mental model of (re)construction (RE)

The BMM of (re)construction, which establishes a link between mathematics and reality, is activated by application-oriented problems. The BMM of (re)construction considers the definite integral of a function f representing the rate of change of a quantity as the total variation of this quantity on a given interval.

In the context of the integral, construction and reconstruction refer to both the (re)construction of a quantity from given data about the rate of change or speed and the (re)construction of an antiderivative from a given function (Bender 1990). State and variation are important categories in this context (Hahn and Prediger 2008, p. 178). The fundamental theorem of calculus links the perspectives of construction and reconstruction: the “function F that we obtain is the same, regardless of whether we view the accumulation … as a new construction given another function f or … as the reconstruction of an antiderivative” (Tietze et al. 2000, p. 287). The BMM of (re)construction is therefore considered to be particularly important for an adequate understanding of the integral (Büchter and Henn 2010, p. 92; Danckwerts and Vogel 2006, p. 98 ff.). There are also links to differentiation: “A basic understanding of differentiation as a local rate of change is key for understanding integration as reconstruction” (Danckwerts and Vogel 2006, p. 125).

The BMM of (re)construction—when placed in the context of a real situation—can be used as an approach to integral calculus: the total variation and the state are reconstructed from a given rate of change. Exercises typically ask students to reconstruct a distance from known speed data (e.g., Hußmann 2007, p. 343) or the current water volume from data on the inflow and outflow rates (cf. Figure 2). Due to the piecewise constant speeds in the diagram, the task can be solved without resorting to areas, in particular before the integral is introduced in school. The inflow speed can be multiplied by time to obtain the water volume. Of course, the quantitative (re)construction in a special problem requires a starting and an end point of the considered situation, e.g., a starting and an end time; it requires an interval to which the (re-)construction—and the definite integral—refer.

Fig. 2
figure 2

Reconstruction of the water volume from inflow and outflow rates in a textbook (Schmidt et al. 2011, p. 144)

If the state and the rate of change are emphasized by introducing the integral in an applied context, the BMMs of reconstruction and also of accumulation (see below) are most important. The BMM of area plays only a subordinate role, since exercises typically do not ask students to determine the area but instead to calculate other quantities, such as the volume of water or the distance travelled. Nevertheless, it is possible to reconstruct the area from a function describing a graph analogously to the ‘net water flow’ problem (cf. Fig. 2). For instance, Huang (2012) invoked the (re)construction conception (cf. Huang 2012, p. 163) and—as part of a qualitative case study—was able to show that interweaving the BMMs of (re)construction and area can lead to an advanced understanding of the concept of integral.

The following two BMMs are usually not used to approach integral calculus. However, they are often discussed in connection with applications and more in-depth studies—including in German textbooks.

3.3 The basic mental model of average (AV)

Against the backdrop of the mean value theorem of integral calculus, the integral can (also) be understood in connection with average values. If the function f is continuous on the interval \(\left[a,b\right]\), then there exists an \(\upxi \in \left[a,b\right]\) such that \(f\left(\upxi \right)=\frac{1}{b-a}{\int_{a}^{b}}f\left(x\right) dx\) (cf. Walter 2004, p. 208). Geometrically speaking, the oriented area under the graph of f above the interval \(\left[a,b\right]\) and the rectangle of width \(b-a\) and height \(f(\xi )\) have the same area (cf. Figure 3). The value of the integral of a given function on an interval divided by the length of this interval is therefore the ‘average function value’ on this interval. This BMM of average of the integral can be interpreted as a generalization of the arithmetic mean. For this purpose, we consider a function f that is continuous on the interval \(\left[a,b\right]\). The interval is partitioned into \(n\) subintervals of equal length, and one element \({x}_{i}\) is selected from each subinterval (\(1\le i\le n\)). The arithmetic mean \(m =\sum_{i=1}^{n}f\left({x}_{i}\right)\cdot \frac{1}{n}=\frac{1}{b-a}\sum_{i=1}^{n}f\left({x}_{i}\right)\cdot \frac{b-a}{n}\) approximates \(\frac{1}{b-a}{\int_{a}^{b}}f\left(x\right) dx\).

Fig. 3
figure 3

Link between the integral and the average value given in a textbook (Schmidt et al. 2011, p. 162)

Emphasizing the BMM of average promotes stronger connections between integral calculus and stochastics, since the expected value of an integrable random variable X is defined by \(E\left(X\right)={\int }_{\Omega }XdP\) and is a generalization of this average (Danckwerts and Vogel 1986, p. 113). This BMM can also be linked to the ‘average rate’, since this places greater emphasis on the average. This fulfils a recommendation from Bezuidenhout et al. (1998, p. 101), who found that students are not sufficiently informed about the concepts of ‘average rate’ and ‘average value’. Weitendorf (2007) also underlined that “understanding how to calculate the average of a continuous quantity requires a geometric interpretation via a comparison with the area, which presupposes an integrated or perhaps even advanced conceptual understanding” (p. 94). However, it should be noted that the BMM of average plays only a minor role in the general curriculum (Tietze et al. 2000; Danckwerts and Vogel 2006).

3.4 The basic mental model of accumulation (AC)

The BMM of accumulation considers the definite integral of a function as the limit of a sum with a large number of small terms. The idea does not focus on the limit value itself but on the accumulation before the limit value is calculated. In general, accumulation is understood to mean ‘aggregation’ or ‘collection and storage’. From the perspective of the BMM of accumulation, the integral is viewed as a product sum that collects or accumulates a large number of partial products over an interval. This perspective of summation tends to emphasize the process rather than the result of integration. The geometric illustration of the BMM of accumulation corresponds to viewing the “integral as the limit of a sum of (the area of) rectangular regions, making the steps of the staircase arbitrarily narrow by taking the limit” (Blum and Törner 1983). This BMM is similar to the BMM of area, but it is more general in that it invokes additional aspects (Blum and Kirsch 1996). The BMM of accumulation therefore occupies a privileged position (Thompson and Silverman 2007, p. 117).

Kouropatov and Dreyfus (2013) proposed that the integral should be introduced in terms of accumulation, since it is not only central to the concept of integration, but also represents the essence for understanding many ideas and applications from calculus (e.g., curve length and volume).

A typical example from a textbook demonstrates how to calculate the volume of a body of rotation using the BMM of accumulation (see Fig. 4).

Fig. 4
figure 4

Calculating a body of rotation in a textbook (Schmidt et al. 2011, p. 172)

The idea of approximation is important in developing a conceptual understanding of the definite integral (Sealey and Oehrtman 2005, p. 83). It is often considered together with accumulation (Kouropatov and Dreyfus 2014), although it can theoretically be linked to other BMMs as well, such as the BMM of area.

In many concrete examples of the BMMs of the integral, physical quantities such as inflow rate, speed, and temperature play a central role. Furthermore, the principle of accumulation is used for many physical quantities such as energy.

Huang (2012, p. 167) is undoubtedly correct in claiming that it would be questionable to restrict the concept of integration to any one or just a few perspectives or BMMs. Instead, it is important when learning calculus to cover diverse perspectives of the definite integral “in order to establish a more enlightening concept” (Rasslan and Tall 2002, p. 8).

4 A test on basic mental models of integrals and the empirical results of a study

In Sect. 2, theoretical considerations about basic mental models (BMM) were presented; in Sect. 3, a focus on theoretical work on the integral was presented and discussed. Based on this theoretical work, it is natural to ask to what extent these BMMs can be identified empirically. Another question of interest is what BMMs students have actually developed in practice, and how these findings can be reconciled with existing empirical findings. Our research questions are as follows:

Q1: How can the structure of the BMMs of the integral be described empirically against the background of the theoretical model?

In order to pursue this research question, it is necessary to first determine whether the four postulated BMMs (AR, RE, AV, AC—cf. Sects. 3.13.4) can be measured with sufficient reliability. We have chosen our instrument so that we do not work with distractors, but only with attractors that should be rated from a subjective perspective on familiarity.

Q2: What types of BMMs of the integral can be found among first-year university students?

On the basis of empirical evidence from the first step, it is of course interesting to determine the distribution of BMMs among students as a second step. In particular, one should look for patterns that relate student parameters (such as membership to a particular group) and BMM intensity.

In order to discuss the two research questions, the development and structure of a test designed to answer these questions are presented below, as well as the procedure followed to verify the test’s quality, and in particular its reliability and validity. We then present the results of a study with mathematics students.

4.1 Developing the test instrument

In order to determine the presence of individual BMMs and establish which of them are expressed in what ways, we developed a test consisting of eight exercises involving the BMMs of derivatives and eight exercises focusing on the BMMs of definite integrals. The test was piloted with 171 first-year university students.

The key idea of the test for measuring BMMs is to present the participants with arguments that use certain BMMs and ask whether the style of argumentation reflects the participants’ own lines of thought. Each exercise consists of a prompt (e.g., a question) and four correct answer options (corresponding to the four BMMs). The tasks were selected with the objective that they should be interpretable in terms of all four BMMs. Ideally, it should be possible to understand all tasks equally well with arguments based on any of the four BMMs, so that, if a student prefers a specific BMM, his or her answer should not be influenced (much) by the item itself, but only by the student’s preference for the BMM. This is hard to achieve in practice, and some items may have an intrinsic bias for particular BMMs. The participants were informed that all four answers are correct and, for each item, were asked to rate—on a 5-point Likert scale ranging from (− −) to (++)—to what extent each answer matched their own manner of thinking. An example of the test structure and our approach is shown by the exercise in Table 1.

Table 1 Samples of items

4.2 Validity of the test instrument

The validity of the test was verified by evaluating the content of interviews with experts. The experts were mathematics educators at universities with a focus on secondary education. We received feedback from five experts. For the 32 items across eight exercises, the experts were asked to assign a suitable BMM for definite integrals (cf. Sects. 3.13.4). In addition, an opportunity for free-response feedback was included, which was taken advantage of by most of the experts. The experts worked independently from one another. In addition to the comprehensibility of the items, they were asked to check whether each of the four items of an exercise also represents one of the four different BMMs well.

The intercoder reliability—calculated with the software package R—was ‘almost perfect’, with a value of \(\upkappa = .874\) (cf. Landis and Koch 1977, p. 165). This result indicates that each item can be assigned accurately to one of the four BMMs.

One comment given in the free response field was that the assignment may be influenced by so-called signal words that clearly guide towards certain BMMs, which potentially influences the quality of Fleiss’ Kappa according to Landis and Koch (1977). However, this feedback should not be problematic for the test itself, where the BMMs are not assigned to the items, unlike in the expert survey. Instead, the BMMs are designed to be activated by the items. The expert feedback about the test instrument on BMMs for definite integrals was interpreted as a confirmation of the thematic validity of the test.

4.3 Reliability of the test instrument

To verify the reliability of the test, a sample collected in Münster in 2018 was used. The sample consisted of 171 students studying to become primary school teachers. The average age was 20.7 years; 95% were in their first semester, and 87% attended school in North Rhine-Westphalia.

Complete data on the Likert scales for integral calculus were collected from n = 158 participants (i.e. none of the integral calculus items were left blank). This group formed the basis for the evaluations performed below. All calculations were performed with R 3.6.1 using OpenBLAS and the lavaan, semTools, semPlot, irr, psych, Lambda4, and MVN packages.

Inspection of the data revealed skewed distributions for many items. The Shapiro test confirms that the results were not normally distributed for any item. However, if we calculate the summed scores (averages) for each of the eight items assigned to four BMMs a priori, the assumption of a normal distribution was not violated for RE and AV, and was only slightly violated for the other two.

The averages and standard deviations of the eight items for each BMM are shown in Table 2 (on a scale from 0 to 4, where 2 is neutral).

Table 2 Summed scores of the items for each BMM

The correlation matrix is shown in Table 3. None of the correlations are particularly high. We interpret this finding to indicate that BMMs can be validly modelled as a multidimensional construct.

Table 3 Correlations

If all items for a given scale (i.e. BMM) were measured equivalently, the reliability can be estimated with Cronbach’s alpha and Guttmann’s lambda 4, as shown in Table 4. The poor results can be explained either by low reliability or by violation of the prerequisite of \(\uptau \)-equivalent measurement models (see Cho 2016; Danner 2015). These metrics are therefore not suitable in this case. Instead, we fitted a reflective measurement model using the SEM package lavaan.

Table 4 Scale homogeneity

First, we created and fitted a reflective measurement model for each BMM. Reflective measurement models are the most common kind of measurement models in structural equation analysis (Hoyle 2012, p. 119). They model a construct by a latent variable that is interpreted as the cause of some observable measures. In our case, for example, the construct of AR is modelled as a latent variable with indicator variables given by the scores \({x}_{1},\dots ,{x}_{8}\) of the eight items that are associated with the construct AR. This is an adequate measurement model in our case, because we hypothesize that having the ability to think according to one BMM can be represented by a one-dimensional scale and that different items differ to some extent in the intensity with which they evoke a particular BMM (and this intensity is measured by the path coefficients). The reflexive measurement model in algebraic form is then \({x}_{i}={\lambda }_{i}\cdot \mathrm{AR}+{\epsilon }_{i}, i=1...8, \mathrm{var}\left(\mathrm{AR}\right)=1\), where the \({\epsilon }_{i}\) are error variables (centred, normally distributed, pairwise independent). Estimating this model gives estimates both for the path coefficients \({\lambda }_{i}\) and the error variances \(\mathrm{var}({\epsilon }_{i})\), as well as fit indices that allow the researcher to judge how well the model fits the data. The results showed that some items undermined the model fit (as well as the reliability). A total of six items were therefore removed from the test. In each case, it was possible to come to a thematic understanding of why the items did not work effectively. In particular the technical language of the items took some getting used to. From school, students are not as accustomed to the technical language as to the language of education. This aspect was a challenge for many of the students.

After optimizing the measurement models in this way, a latent variable with variance 1 was defined to measure the construct that loads on the remaining 6 to 7 items per scale. This resulted in the figures presented in Table 5. (The method of parameter estimation was the lavaan estimator MLR, namely robust maximum likelihood. The violation of the model assumptions of a strict maximum likelihood approach suggests that this is advisable. Calculations were performed with even more robust variants, which delivered better values for the model fit, meaning that the values given in Table 5 can be considered very reliable.)

Table 5 Scale qualities after the first optimization

These data can be interpreted as follows. The measurement models all show good to very good model fits: especially RMSEA ≤ 0.05 indicates a ‘close fit’ (Hoyle 2012, p. 217). This is a confirmation that each of the four BMMs is one-dimensional. As expected, the reliabilities (alpha and omega) estimated with the reliability function from the semTools package of R are not very high. For these congeneric reliabilities, Bagozzi and Yi (1988) require a value of at least 0.6. The BMM RE does not meet this condition, but, since the structure model is a good fit, we accept this model below.

As a second step, an overall model was adapted with four latent and normally distributed variables that are correlated but not linked by regression equations. Each of the four latent variables loads on the items listed in the above table. A graphical representation of the model would therefore be a disjoint union of the four sub-models used above. However, since the assumption of a multivariate normal distribution is even more strongly violated across all 32 items than for the individual scales (as verified with the R package MVN), estimation using maximum likelihood methods is no longer possible. In a comparative study of Likert scales, Tarka (2017) recommends the estimation methods WLSM and WLSMV. According to Hoyle (2012, p. 495ff), these methods are also suitable for ordered categorical variables. For a model without structural equations between the latent variables, the meaningfulness requirement on the minimum sample size of \(n>p\cdot (p+1)/2\) is satisfied with p = 8.

Lavaan estimates the model as follows with the WLSM method (robust weighted least squares):

CFI = 0.98, TLI = 0.97, RMSEA = 0.02, RMSEA robust = 0.04, SRMR = 0.08, x2 p = 0.24.

The correlations between the latent variables are shown in Table 6. This shows that at least the BMM of area is clearly separated from the others. The high correlation of the BMM of reconstruction with the BMM of average is also particularly striking.

Table 6 Correlations between latent variables

The reliability of the test measurement is good for at least three BMMs (AR, AV, AC), and they can be separated from one another. In our test, the BMM of reconstruction (RE) was insufficiently measured and separated from the BMM of average (AV). The reason why these scales are relatively closely linked can be answered—albeit hypothetically—in terms of the correlation of the items.

As a consequence, the scale formation was reviewed on the basis of these results, and the BMM assigned to one item was reclassified. The exercise in question focuses on calculating the volume of a body using a given function describing the cross-section of the body as a function of the height. The explanation associated with the BMM of average—“The expression corresponds to the average cross-sectional area of the body multiplied by its height”—correlated more strongly with the other reconstruction items than with the other average items. To experts, the expression ‘average cross-sectional area’ may have signalled that the BMM of average was intended. However, for participants, this connection was not clear because, as mentioned above, the BMM of average is only marginally promoted in class, if at all. When discussing the volume of bodies, many textbooks heavily emphasize Cavalieri’s principle. Against this background, the following line of thought seems plausible: if the body is made ((re)constructed) from many different slices, then, to calculate the volume, it does not matter whether they are of different sizes or have a constant average area. Accordingly, the BMM of reconstruction also applies to this item. After a detailed discussion of the item, it was therefore reclassified to the BMM of reconstruction. After modifying the scales in this way, the values in Table 7 were obtained.

Table 7 Scale qualities after optimization by moving an item

Both scales have the same homogeneity level, and the measurement models show a good model fit. A good model fit was also obtained for the overall model with CFI = 0.91, TLI = 0.89, RMSEA = 0.02, RMSEA robust = 0.04, SRMR = 0.08, x2 p = 0.35.

4.4 Basic mental models of the integral in first-year students

Once our test instrument passed the reliability and validity criteria, we applied it to our main study. A survey of n = 428 students in first-year mathematics courses was used to verify our results from testing and from the measurement model. The participants were recruited at the beginning of their studies, in mathematics lectures at the Universities of Augsburg, Bayreuth, and Würzburg, by asking the lecturer to administer the test as part of the first lecture. This procedure aimed to ensure that the BMMs recorded in the survey reflected their state at the conclusion of their schooling. A total of 276 students were enrolled in the first semester, and most of the others were in their third semester (the average number of semesters completed was 1.39). A total of 69.2% of the participants were male.

The scale homogeneity in this group was similar to the pilot study (Table 8).

Table 8 Scale homogeneity in the study sample

Table 9 shows the average agreement (with the standard deviations in parentheses) for each scale (over an interval of [0,4], where 2 is neutral) for the entire group, as well as subdivided into first-semester students and other students. The BMM of area yielded the highest agreement. This is in accordance with expectations, since this BMM is used extensively in practice at school. The BMM of reconstruction, which is supposed to be emphasized especially according to the curriculum, was in second place, and significantly behind. These assessments evolved somewhat as the students’ experience with university mathematics increased. Agreement with the BMM of area decreased (significantly: Wilcox p = 0.037, Cohen d = 0.19), and the others increased non-significantly.

Table 9 Effect of longer university studies on the assessment of mental models (values in parenthesis are means and standard deviations) (The decrease in participant numbers from semester 1 to higher semesters is due to the fact that many participants did not explicitly state their semester and therefore this detail could not be taken into account in the evaluation)

There are significant differences between students depending on their second subject of study. As students, German teachers have to study two subjects. In most universities, students heading for a master’s in mathematics have to choose a minor subject, too. Therefore, it was natural to investigate whether this second subject had an impact. The most interesting result was obtained for the students with physics as a second subject because this subgroup was large enough and sufficiently different. The agreement values of students with physics as a second subject are shown in Table 10. There are significant differences in the BMM of reconstruction (Wilcox p = 0.0015, Cohen d = 0.29) and the basic mental model of accumulation (Wilcox p = 0.024, Cohen d = 0.23) by comparison with students without physics as a second subject.

Table 10 Assessments by students with physics as a second subject

A general hypothesis is that the ability to think according to multiple basic mental models (expressed as above neutral scores in multiple basic mental models) indicates a deeper understanding. Since the test did not contain any items on mathematical performance, this hypothesis cannot be examined directly. However, participants were asked to give their average mathematics scores from the previous academic year. This grade measures academic success across a range of mathematics topics in school, not limited to integral calculus, on a scale of 0 to 15 (higher values corresponding to better performance). The average was 11.7 points. The correlation between academic mathematics performance and the sum of all basic mental model scores (AR + RE + AV + AC) was 0.13 and differed significantly from 0. If we divide the group into (relatively) weak students with a mathematical performance of 12 points or lower and a stronger subgroup with higher performance, we observe that the overall assessment differs significantly between these groups (d = 0.19, p = 0.040). This effect is especially strong for the basic mental model of area (d = 0.26, p = 0.0002) and especially weak for AV. In this analysis, we took the sum of scores as a measure. Alternatively, one can take the number of basic mental models for which the students indicated a score above average. This measure fails to differ significantly in the two groups (p = 0.061) but, when AV is omitted, this test is also significant (p = 0.022).

5 Discussion and conclusion

We were able to show that four normative BMMs of the integral could be detected and separated empirically. From a methodological perspective, the construct of BMMs was measured reliably and validly for the example of integral by a test. In general, we were able to answer our research questions with sufficient accuracy using our selected methodological approach. With regard to our test results, it should be noted that our test was conducted only with mathematics students. This selection of test persons could lead to a distortion of the results, and one should therefore be careful when interpreting the results. Accordingly, we think it would be important to test the approach with other student cohorts as well. The results thus obtained can form a basis for performing and evaluating intervention studies on the acquisition of BMMs.

Concerning our results, it is first of all positive that all BMMs can be determined empirically to a certain extent. The existence of differences was to be expected, and these differences seem plausible. The distribution of the four BMMs aligned with our expectations from experience in practical school contexts. Thus, the BMM of area played the most decisive role, whereas the BMM of average played a comparatively subordinate role for learners. One critical observation is that the requirements of the upper secondary education standards introduced in Germany in 2012, which emphasize the BMM of reconstruction, have not yet been fulfilled. However, on a positive note, a change was observed in students from higher semesters, showing that preferences for BMMs can evolve over time.

Based on our findings in Table 10, it seems reasonable to suggest that the stronger preference for the BMM of reconstruction by students with physics as a second subject can be attributed to their natural exposure to dealing with physical quantities. Physical problems generally have an application-oriented background, so it seems plausible that physics students have preferences for the BMM of reconstruction in this regard. This fits with the observation in Sect. 3.4 that many examples occur in the field of physics. The stronger preference for the BMM of accumulation among students with physics as a minor subject also fits this observation.

Furthermore, the contents of the test should be extended to other topics beyond infinitesimal calculus. This could be done in various ways. The model of BMMs presented above could (additionally) be used to investigate the relationship between aspects and BMMs, with exercises on the technical understanding of integral calculus. This raises the question of the importance of BMMs when solving exercises. The role of the expression of the BMM construct in successfully completing exercises should therefore be investigated from this perspective. Against the background of the various requirements of German educational standards, this seems particularly important in order to clarify what transfer performance is required in practice. Besides integral calculus, differential calculus is the second main area of calculus. It would therefore be desirable to organize an analogous test for differential calculus, namely, for BMMs of differentiation. This could also provide an opportunity to study the relationships between the BMMs of integral and differential calculus.

Additional studies on the development of BMMs while studying mathematics might help to clarify the differences identified between first-year and more advanced students. The present study design was not sufficient to answer this question, as this would require a longitudinal study. The foundations for the development of BMMs are established during mathematics lessons. A key task is therefore to determine the potential for achieving a balanced development of BMMs in calculus classes.

We consider this study on the concept of integral to represent a starting point for empirical investigations on BMMs. The perspectives in terms of technical content, other areas besides integral calculus, and development potential in school maths lessons and possibly university studies provide considerable impetus for research over the next few years.