Algebra plays a crucial role in students’ further studies, as it is involved in many science courses and therefore determines their access. Since the 1990s, mathematics education research on the teaching and learning of algebra (Bednarz et al., 1996; Chevallard, 1989; Kieran, 1992, 2007; Sfard, 1991; Vergnaud et al., 1987) has made a major contribution to understanding and characterizing the learning processes in algebra and the effectiveness of teaching practices in achieving them. However, students encounter many difficulties in international tests (e.g., PISA, TIMMS); in particular, in the TIMSS 2019, French Grade 8 students attained an overall score in mathematics of 483, below the international average of 511, and an even lower Algebra score of 468 (Le Cam & Salles, 2020).

Mathematical knowledge is highly dependent on the schools, classes and, more generally, the institutions in which it is lived, learned, and taught according to the didactic transposition process (Chevallard & Bosch, 2020a). Each student learns in several successive classes composed of different students, with teachers whose expectations and practices are often different, with potential impact on the student’s learning. Teachers, who often say they have a “good” class or a “weak” class, may have to deal with significant heterogeneity in their students’ knowledge and make teaching choices to help them progress toward what is expected. To illustrate this, consider the proof problem shown in Fig. 1 and the answers of three middle school students (Table 1) at the beginning of grade 9 (14–15 years old).

Fig.1
figure 1

Example of a proof problem

Table 1 Answers of three middle school students at the beginning of grade 9 to item 6

Students A and C both state that the assertion is true, but student A gives an incorrect proof by example, an arithmetic strategy corresponding to primary school practices, while student C uses an algebraic strategy that is expected at the end of middle school. Student B also uses an algebraic strategy, but incorrectly. Three levels of reasoning emerge from this example. Even after studying algebra for two years, students have already developed an algebraic activity that may differ from one student to another and be more or less adapted to their further schooling.

In addition, some researchers (Bressoux, 2012; Nye et al., 2004) are interested in the potential effect of the class environment on student learning. The class effect is related to the composition of the class, the average academic level of the students, their distribution between “good” and “not so good” and the resulting heterogeneity, and to the number of students in the class. Their studies show that the class effect needs to account for the classroom teacher’s practices. To contribute to our understanding of the mechanisms underlying student learning, we aim to characterize the knowledge that students build in a mathematical domain (Grugeon-Allys et al., 2018) and the changes in this knowledge according to class membership over a school year.Footnote 1

Therefore, we address the following questions: How can we account for the diversity of students’ learning in algebra? How can we summarize the diversity of algebraic learning by student and by class? How can we account for the variations in students’ learning over the course of a school year depending on the class effect, particularly the learning of other students in the class to which they belong?

To answer these questions, we look at student assessment. Assessing only success or failure in an item is insufficient. It is necessary to assess the students’ knowledge and reasoning, to distinguish them according to whether they are arithmetic or algebraic, and to situate them according to curricular expectations.

To do so, we follow the research carried out in France since the 1990s (Artigue et al., 2001) on the assessment of students in algebra and the regulation of teaching. We use the PépiteFootnote 2 automated assessment (Chenevotot-Quentin et al., 2016; Grugeon, 1997; Grugeon-Allys et al., 2018, 2022), from which the task in Fig. 1 is taken, designed both by teachers and by researchers in mathematics education and in interactive learning environments. Pépite aims to study students’ knowledge and reasoning in a holistic way, from an epistemological and institutional perspective. It associates the students’ answers with knowledge and errors listed a priori. Previous studies using Pépite, mainly qualitative, aimed at describing students’ learning and helping teachers to manage heterogeneous classes. They highlighted recurring patterns in students’ levels of reasoning in algebra. The quantitative study based on Pépite in grade 9 presented in this article aims to characterize both the diversity of students’ learning of algebra and its variation over the grade 9 school year, considering the class effect (in terms of learning).

The rest of the paper is organized as follows; we begin by presenting the theoretical background for characterizing students’ learning of school algebra, followed by the methodology. The results are then provided in three sections, followed by the discussion of the findings.

1 Theoretical foundations

1.1 An anthropological approach

We use the anthropological theory of the didactic (ATD, Bosch et al., 2017; Chevallard & Bosch, 2020b), which is based on the hypothesis that mathematical objects do not exist per se but emerge from teachers’ practices through the mathematical activities they develop in their classes, which may differ from one teacher to another. In ATD, all regularly performed human activity is modeled under a single model, called “praxeologies,” in terms of types of tasks, techniques used to solve these tasks of a given type, “technological discourse” based on knowledge and reasoning developed to justify techniques, and “theories” that organize the local technological discourse into coherent structures. We describe the praxeologies learned by students in relation to the praxeologies to be taught at the end of middle school and at previous levels. Praxeologies are not isolated but structured in relation to each other: Praxeologies are aggregated into local praxeologies around a technology, then into regional praxeologies around a theory, and finally into global praxeologies around several theories. The task in Fig. 1 concerns a praxeology of proof, and the students’ answers reveal three techniques involving three different technological discourses.

To describe learned praxeologies by students that are not always mathematically adequate, we link the techniques used by students, especially those that are erroneous or inappropriate, to the use of old knowledge or the incorrect use of new knowledge involving kinds of errors and reasoning already highlighted by research on school algebra.

1.2 Results in the didactics of algebra

A large body of research links students’ difficulties to the discontinuities (Kieran, 2007; Vergnaud et al., 1987) that exist between arithmetic and algebraic thinking and to the specificity of algebraic semiotic practices, particularly with regard to the status of the letter, the status of equality, and the use of algebraic knowledge to solve problems. In the 1990s, teaching strategies were proposed to introduce algebraic thinking (Bednarz et al., 1996). Another approach, which emerged in the 2000s with the “Early Algebra” movement, aims to develop algebraic thinking earlier in the curriculum (Carraher et al., 2006; Kieran, 2018; Kieran et al., 2016; Radford, 2013) with a continuous trajectory from the beginning of primary school to the end of middle school.

We summarize findings on the teaching and learning of algebra according to Kieran’s (2007) model of algebraic activity. This model is based on three types of activity. First, generative activity concerns the formation of algebraic objects (formulas, algebraic expressions, and equations) in order to solve problems (modelling, generalizing, proving). Solving these problems requires translation between different registers of representation (algebraic scripts, numerical scripts, graphical representations, geometric figures, natural language). Second, transformational activity involves transforming expressions and equations in a way that preserves their equivalence. This may involve substitution, development, factoring, simplifying algebraic expressions and solving equations. Transformational activity is based on the structure of objects (sum, product). Third, global/meta-level activity involves solving modeling, generalization, proof, and equation problems that involve analogy (Radford, 2013). It leads to algebraic justifications and proofs as opposed to arithmetic reasoning.

These three activities call on the different status of letters (variables, unknowns, as opposed to label letters), equality as a relation of equivalence as opposed to equality “to perform,” the complementary procedural and structural characteristics (Sfard, 1991) of the objects of algebra, their denotation (“Bedeutung”), and their sense (“Sinn”) (Drouhard, 1992; Frege, 1971), based on their structure.

Thus, algebra is both a tool and an object. It is a tool for solving different types of problems (Chevallard, 1985, 1989; Ruiz-Munzón et al., 2013) that involve generalization, proof, modeling, and equating. It is a structured set of objects—algebraic expressions, formulas, equations or inequations—with specific properties and semiotic representations associated with different registers and processing modes. The algebraic processing of these objects brings into play both their syntactic and semantic aspects, based on a fair balance between the technical and theoretical dimensions of processing.

This synthesis is the foundation of both the reference model of algebra presented next and the analysis criteria for distinguishing different levels of reasoning.

1.3 A reference model for elementary algebra

A reference model of a mathematical field (Bosch & Gascón, 2005; Ruiz-Munzón et al., 2013) is a possible way to describe the complexity of the knowledge to be taught, based on praxeologies. Based on the previous synthesis, the epistemological reference model of elementary algebra that we adopted is structured into three regional praxeologies related to algebraic expressions, formulas, and equations, which in turn are structured into five local mathematical praxeologies (Grugeon-Allys et al., 2022). We present them taking into account the three objects, expressions, formulas, and equations:

  1. 1.

    “Modeling” praxeologies, noted M, aimed at solving problems of generalization (algebraic expressions), modeling (formulas) and equating (equations)

  2. 2.

    “Proving” praxeologies, noted P, aimed at proving properties

  3. 3.

    “Calculating Numerically” praxeologies, noted CN, aimed at calculating numerical expressions and recognizing equalities

  4. 4.

    “Calculating Algebraically,” noted CA, aimed at operating on algebraic expressions (substituting, recognizing, developing, factoring) and equations (testing, solving an equation)

  5. 5.

    “Representing” praxeologies, noted R, aimed at translating a relation between definite and indefinite elements from one register of semiotic representation to another, or at associating several representations of an expression or equation between different semiotic registers.

The reference model is used to analyze and link the praxeologies to be taught, taught, and learned. The five praxeologies structure the analysis. As presented in the introduction to the proof task, students construct praxeologies that are not always mathematically correct or to what is expected at the end of middle school. We will therefore try to describe the levels of reasoning on which they rely in the different praxeologies.

1.4 Assessment of students’ algebraic learning

Assessing students’ learning requires us to consider that students develop new praxeologies concerning algebraic expressions and equations in their training at middle school in interaction with the praxeologies taught by their teachers. We assess learning in a holistic way (Vergnaud, 2009). We characterize the students’ learned praxeologies based both on their answers to tasks covering all the local praxeologies to be taught and on the way in which they solve them (techniques, knowledge and reasoning). For this purpose, we use the 24 tasks of the Pépite test covering the praxeologies of the reference model in algebra (Table 2). One task can mobilize several local praxeologies.

Table 2 Praxeologies involved in the 9th grade level Pépite Test (24 items)

The tasks are multiple-choice or open-ended. Figure 1 shows the sixth Pépite task. Appendix and Grugeon-Allys et al. (2018) provide other examples. The analysis of Pépite is fully automated. Students answer on the computer and an algorithm analyzes and codes their answers (closed and open) (Grugeon-Allys et al., 2018). In addition, Pépite not only assesses the validity of the answers produced by the students, task by task, but also assesses the validity of the answers produced by each student regarding what is expected in the curriculum. Indeed an a priori analysis of each task also enables an assessment of the knowledge and reasoning used by the student to justify his or her answer to each praxeology.

1.5 Technological-theoretical levels of students’ algebraic learning

We assess the learned praxeologies according to four technological-theoretical levels (Grugeon-Allys, 2016) on each local reference praxeology, called θ-levels. Based on the epistemological study of algebra presented above, we define these θ-levels a priori by hierarchizing them as shown in Table 3. These θ-levels enable us to distinguish the knowledge and reasoning used by a student for each task of a given type. These levels are then used to code the students’ answers according to each local praxeology, as described in Table 4.

Table 3 Description of the θ-levels for algebra
Table 4 Characterization of θ-levels according to local praxeologies concerning algebra

Given a school level, for each item, possible students’ answers are listed and a priori coded in an analysis according to whether they are correct (V1 or V2 codingFootnote 3) or not (V3 coding) and the θ-level they involve in each praxeology (Grugeon-Allys, 2016; Grugeon-Allys et al., 2022). Tables 5 and 6 show an excerpt from this analysis for the item shown in Fig. 1. We have chosen to indicate the most frequent (non-exhaustive) techniques identified in the didactic analysis and in the students’ recurrent answers. These techniques can be based on primary arithmetic or algebraic and depend on the correct or incorrect semiotic representations used. They are related to the corresponding θ-level on the list.

Table 5 A priori analysis of primary arithmetic techniques
Table 6 A priori analysis of algebraic techniques

The answer analysis of the three students presented in Table 1 is detailed in Table 7.

Table 7 Analysis and coding of the answers of three students to item 6

The definition of θ-levels enables a macroscopic analysis on all tasks with the same coding to avoid sticking to a microscopic and task-by-task analysis. We can then identify the knowledge and reasoning that students predominantly use on all the tasks related to the same local praxeology. The student’s learned praxeology in algebra is therefore described by his or her percentage of successful tasks and by five θ-levels related to each of the five local praxeologies, described by a sextuplet.Footnote 4

In projects using Pépite, an algorithm determined the dominant θ-levels for each local praxeology (Grugeon-Allys et al., 2022), with thresholds set based on experience. In addition, we have defined three groups of students (Grugeon-Allys et al., 2018) to enable teachers to better organize their teaching according to the learning needs of their students. For each group, a type of reasoning is used predominantly, either an adequate algebraic reasoning, or one in the process of being constructed, or one taken from elementary school. In this study, this algorithm is not used as we perform statistical analyses on all the θ-levels used by students in their answers.

1.6 Research questions

A teacher needs to have an overview of his or her class to know what his or her students’ learning needs are. Even if the potential number of learned praxeologies (sexuplets) is large, there are certainly praxeologies learned by students who are close, that is, with many of the same θ-levels on all tasks (for instance, a majority of “Adequate” per local praxeologies). The composition of one class may differ from another, depending on the praxeologies learned by the students who make up the class. Indeed, in any class, there may be a varying number of students with adequate, old, or under construction learned praxeologies, which does not offer the same learning conditions to the students. Therefore, we ask for classes (RQ1): Are there similar classes in terms of learning, that is, with most of the same θ-levels of local praxeologies on all task answers of students composing a class? And for students (RQ2): Are there students with similar learned praxeologies? In addition, for similar classes and students, we want to know (RQ3) if there are any variations between the beginning and the end of the grade 9 year. And (RQ4) do these variations rely on class effect (in terms of learning)?

2 Methodology

This large-scale study involves a sample of 36 classes and 771 students and their teachers (36). We begin by presenting how the sample was constituted for this large-scale study. We then move on to the construction of the databases, showing how the reference model of algebra and the θ-levels serve as a basis for structuring data and interpreting statistical analyses. Finally, we present the multivariate descriptive statistical methods used to obtain clusters of classes and students to establish learning similarities between classes (RQ1) and students (RQ2) at the beginning and at the end of the year (RQ3). The study of class and student belonging to clusters between the beginning and end of the year enables us to analyze variations over the school year in relation to class effect (RQ4).

2.1 Sampling

The sample was constructed with the help of the Limoges Académie French educational authority.Footnote 5 It is not representative of the population studied, but it was built according to specific criteria, presented in Table 8, to have a diversity of teaching contexts (rural or urban, public or private education, priority or not education) and teachers (experience and age) within the Académie.

Table 8 Criteria used to select the sample

The 771 grade 9 students were distributed in 36 classes, from the same academy. The students in the sample classes took the same Pépite algebraic test twice during the school year, Test 1 at the beginning and Test 2 at the end of the 9th grade. Each test was administered during a 50-min class period. The administration of the tests in schools was managed by the DEPP and the Limoges Académie during the 2018–2019 school year, which was not a period of educational reform in France. We did not meet the teachers or the students and had no information on how the teachers planned their lessons or on what was taught.

2.2 Organizing the data

We built a database of students’ responses coded in relation to the θ-levels for each local praxeology of algebra to study similar classes or similar students. Once the 771 students passed the tests, the collected data were anonymized, cleaned, and organized into databases in the form of individuals/variables tables. The data were cleaned to keep only the 25 classes in which at least 15 students passed Test 1 and the 454 students that took both tests. We constructed two classes databases C1 and C2 (unit of analysis is the class), one for each test to answer to RQ1 and RQ3. Respectively, we constructed two students databases S1 and S2 (unit of analysis is the individual student), one for each test to answer to RQ2 and RQ3.

The databases are structured by continuous quantitative statistical variables concerning the success rate and the θ-levels of the local algebraic praxeologies. We built these databases on the initial coding of the students’ answers in Pépite (Fig. 2), where each row captures the coding of a student’s answer to an item and each column corresponds to the θ-levels of the local praxeologies. A 1 is assigned when the answer corresponds to that θ-level.

Fig. 2
figure 2

Coding of the answers to the 24 items in Pépite from student 22 in class 1016269

For the classes database, an extract of which is presented in Fig. 3, the values of the variables displayed in the columns are the results of several calculations, carried out on tasks answered by the students of each class:

  • A Success rate: the percentage of tasks successfully answered by the students in the class.

  • A Failure rate: the percentage of failed items. The sum of the Success and Failure rates does not necessarily equal 100%, because the calculation does not consider the tasks not processed and the answers not analyzed by the Pépite software.

  • A rate on each θ-level for each praxeology of algebra: percentage calculated from the sum of the “1” for each variable, for all the answers on the tasks processed by the students in the class.

Fig. 3
figure 3

Extract from the classes database for Test 1, showing the proportion of success and failure and the various praxeologies

Note that not all θ-levels appear in Fig. 2. To obtain consistent statistical analyses, it was necessary to group certain θ-levels because some students did not answer to the open-ended modeling and proof tasks. The same rates are calculated for each student on each test:

  • A Success rate: the percentage of successful items completed by the student.

  • A Failure rate: percentage of failed items.

  • A rate on each θ-level for each praxeology: percentage calculated from the sum of the 1 for each θ-level, for all the student’s answers.

2.3 Multivariate descriptive analysis

Given our research questions and the large sample, we apply multivariate descriptive statistical analyses to these databases, in particular principal component analysis (PCA) and hierarchical agglomerative clustering (HAC), to determine clusters both of classes and students and the variables (θ-levels for each local praxeology) that best identify them.

The objective of a PCA is to identify the structure of the data based on the most relevant dimensions using a factorial method to obtain an overview of the similarities between individuals, in this case the characterization of classes in terms of learning and then that of the students. This method consists of constructing and selecting new variables, the principal components, obtained from the correlation matrix constructed on the data; these summarize the most important information in the database. The principal components enable us to determine one or more factorial plans that maximize the information in the cloud of points-individuals projected on these designs (Hahn & Macé, 2017). The quality of the representation of individuals and variables is obtained using the Cos2 method (Hahn & Macé, 2017). The representations used for PCA, such as correlation circles and individual scatterplots, are visual graphical summaries. Correlation circles indicate which of the linear combinations of variables are the most informative for each factorial axis. In this study, PCAs were performed on the classes and then the students databases for Tests 1 and 2. Studying the scatterplots of classes and students enables us to locate them in relation to each other and to situate classes or students according to the direction of the factorial axes we have interpreted.

HACs are then performed to further group classes or students into clusters that are similar in terms of learning. We chose this clustering method because the structure of the clusters is not known in advance, and our data have an intrinsic hierarchical structure (hierarchy in θ-levels for each local praxeology). An HAC consists of performing a series of successive partitions of individuals, nested one within the other, with the groupings into clusters being made in relation to the greatest proximity of the individuals. HACs were performed on data reduced by PCAs. This choice was made to simplify the complexity of the data while preserving crucial information. Comparing the clusters obtained by the PCA and HAC of Test 1 with those of Test 2 for the classes and students enables us to study the variation of learning over a school year (RQ3) and the link with the composition of the classes in terms of learning (RQ4).

3 Results: similar classes in terms of learning and their variation over the school year

3.1 A multivariate analysis on test 1 results per class

We identify similar classes and their characteristics on the C1 database. To facilitate PCA, certain statistical variables have been grouped together. For example, Representing_Adequate and Representing_Weakly-Adequate have been grouped together into Representing_Adequate_Weakly-Adequate.

The PCA identifies a first factorial plane (71% of the information) and the four variables correlated with Axis 1 (59% of the information) that best contribute to its formation, namely, Failure, Success, Calculating-Algebraically_Adequate, and Representing_Adequate_Weakly-Adequate (Fig. 4). These variables have a very good representation quality of since their cos2 is greater than 0.8. Calculating-Numerically_Weakly-Adequate and Calculating-Numerically_Old variables are correlated with Axis 2 (12% of the information) but with a less good quality of representation.

Fig. 4
figure 4

Correlation circle in the first factorial plane for Test 1 (N = 25)

In the first factorial plane (Fig. 5), the classes located farthest to the right in the first and fourth quadrants have a good Success rate, as well as Calculating-Algebrically_Weakly-Adequate (correlated with axis 1) and Calculating-Numerically_Weakly-Adequate, unlike the classes located farthest to the left. For several classes located in the middle, with Cos2 close to or less than 0.25, the quality of the representation is less good than those with a Cos2 greater than 0.5.

Fig. 5
figure 5

Distribution of classes in Test 1 on the first factorial plane and quality of their representation (N = 25)

The second factorial plane (68% of the information) shows a good quality of representation for the variable Calculating-Numerically_Under-Construction on Axis 3 (9% of the information) (Fig. 6). Figure 7 shows the distribution of the classes on the second factorial plane.

Fig. 6
figure 6

Correlation circle of the second factorial plane for Test 1 (N = 25)

Fig. 7
figure 7

Distribution of classes on the second factorial plane and quality of representation for test 1 (N = 25)

We interpret these two factorial planes as relating the Adequate or Old θ-levels of algebraic praxeologies and θ-levels of Calculating-Numerically in order to study class similarity.

The HAC then identifies three clusters of similar classes (Fig. 8), called Cl-A1 (4 classes), Cl-B1 (14 classes), and Cl-C1 (7 classes) and described in Fig. 9.

Fig. 8
figure 8

Three clusters of similar classes on Test 1 (N = 25)

Fig. 9
figure 9

Percentages for the variables that best contribute to the formation of the first factorial plane on Test 1 (a) and Test 2 (b)

Cluster Cl-A1 has a higher Success rate (53%) than Failure (40%), and the rates for the Adequate or Weakly-Adequate θ-levels concerning each praxeology are close to 50%. At the beginning of the school year, the Cl-A1 classes contain many students who use the algebraic practices expected at this school level.

Cluster Cl-B1 has a lower Success rate (41%) than the Failure rate (53%). These classes have rates for the Adequate or Weakly-Adequate θ-levels concerning Calculating-Algebraically, Modeling, and Representing that are close to 30%. But the rate on Calculating-Numerically_Weakly-Adequate is 46% indicating that these classes have a learning leverage for Calculating-Numerically. Having a rate of Calculating-Numerically_Weakly_Adequate seems to be a lever at the beginning of the school year. This cluster brings together classes of rather heterogeneous composition.

Cluster Cl-C1 has a very low Success rate (30%) and very low rates on the Adequate or Weakly-Adequate levels below 30%. Moreover, the rate on Calculating-Numerically_Old is high (60%). At the beginning of the school year, the Cl-C1 classes have a predominantly primary arithmetical practices to solving algebraic problems. The Calculating-Numerically praxeology is a not lever for classes that are far from what is expected by the school.

3.2 Variation during a school year

We performed the same analyses on the C2 database. The variables that best contribute to the formation of the first factorial plane are the same as in Test 1, except for Calculating-Numerically_Old (absent) and Proving_Adequate (new). We study how the classes are distributed in the clusters of the two tests (RQ1) and how the classes move from one cluster to another (RQ3).

The PCA and HAC analyses distinguish three clusters, called Cl-A2 (9 classes), Cl-B2 (13 classes) and Cl-C2 (3 classes) (Fig. 10).

Fig. 10
figure 10

Three clusters of similar classes on Test 2 (N = 25)

The characteristics of the three clusters in Test 1 (Cl-A1, Cl-B1, Cl-C1) and those of the Test 2 (Cl-A2, Cl-B2, and Cl-C2) are broadly comparable for all the variables that best contribute to the formation of the first factorial plane (Fig. 9), although the Success rate in Test 2 is higher than in Test 1 for each cluster (5 points higher for Cl-A2 and Cl-B2 and 2 points higher for Cl-C2).

The comparison of the number of classes in each cluster shows that the number of classes is greater in Cl-A2 than in Cl-A1 (4 Cl-A1 and 9 Cl-A2), less in Cl-C2 than in Cl-C1 (7 Cl-C1 and 3 Cl-C2), and closer in Cl-B2 (13 classes) and Cl-B1 (14 classes). Furthermore, as shown in Table 9, 10 classes advance by changing clusters, 5 from Cl-B1 to Cl-A2 (especially for Proving_Adequate), and 5 from Cl-C1 to Cl-B2. On average, by the end of the year, students in these five classes improve their arithmetic practices and begin to use the algebraic practices expected at this grade level to solve algebraic tasks, even if some algebraic techniques are still incorrect. Only one class regresses from Cl-B1 to Cl-C2 which means that, overall, this class uses more arithmetic practices than at the beginning of the year. The other classes progress within the same cluster, from Cl-A1 to Cl-A2 (4 classes) or Cl-B1 to Cl-B2 (5 classes) or Cl-C1 to Cl-C2 (2 classes). These classes perform better but with no significant change in their practices.

Table 9 Variation of the distribution by cluster of classes between Tests 1 and 2

Statistical analysis has revealed three clusters of similar classes at the beginning and end of the year, answering to RQ1. The levels on the Calculating-Numerically, Calculating-Algebraically, Modeling, and Representing (i.e., 4 of the 5 local praxeologies) that best represent these clusters are consistent within each cluster, with rates at the Adequate or Weakly_Adequate levels close to or above 50% for Cl-A1 and Cl-A2, between 30 and 50% for Cl-B1 and Cl-B2, and close to or below 30% for Cl-C1 and Cl-C2 (Fig. 9). These characteristics are consistent with the informal labels often used by teachers to designate “good” and “weak” classes, but they provide criteria for learning algebra and can guide teaching decisions to manage class heterogeneity.

To answer RQ3 about classes, comparing the overall distribution of classes within clusters shows that not all classes are moving in the same direction in terms of learning between the beginning and the end of the year. Classes in the Cl-A1 cluster move to Cl-A2, while a majority (5 out of 7) of Cl-C1 classes move to Cl-B2, marking an evolution in the Calculating-Numerically praxeology. Cl-B1 classes evolve in very different ways.

These analyses are based on rates calculated by class, which do not provide any information about the students’ learning who make up these classes. This is why the following analyses focus on students (RQ2 and RQ3).

4 Results: similar students’ learned praxeologies in algebra and their variation during a school year

4.1 A multivariate analysis on test 1 results per student

We identify clusters of students with common learning characteristics on the S1 database. PCA identifies a first factorial plane (62% of information) with six variables that best contribute to its formations (Fig. 11): Success, Calculting-Numerically_Weakly-Adequate, Failure, Calculting-Numerically_Under-Construction_Old, Calculating-Algebraically_Adequate, and Representing_Adequate_Weakly-Adequate. As for classes, we interpret the first factorial plane as relating the Adequate or Old θ-levels of algebraic learned praxeologies and θ-levels of Calculating-Numerically. The HAC then leads to the identification of three clusters of students (Fig. 12), named St-A1 (26%), St-B1 (39%), and St-C1 (35%), and described in Fig. 13.

Fig. 11
figure 11

Correlation circle in the first factorial plan for Test 1

Fig. 12
figure 12

Three clusters of similar students on Test 1 (N = 454)

Fig. 13
figure 13

Percentages for the variables that best contribute to the formation of the first factorial plane on Test 1 (a) and Test 2 (b)

Cluster St-A1 includes students whose Success rate (60%) is very high compared to Failure rate (32%) and whose rates on the Adequate or Weakly-Adequate θ-levels on three of the five local praxeologies (Calculating-Numerically, Calculating-Algebraically, and Representing) close to 60%. St-A1 students start the school year with already well-established algebraic practices.

Cluster St-B1 is composed of students whose Success rate (38%) is lower than the Failure rate (47%) and whose rates on the Adequate or Weakly-Adequate are close to 30% on Calculating-Algebraically and Representing. The rate for Adequate and Weakly-Adequate (56%) in Calculating-Numerically is the highest among the Adequate and Weakly-Adequate θ-levels of other praxieologies and is higher than for Under-Construction and Old (40%). Students in St-B1 start the year having built up Calculating-Numerically praxeologies, which are a possible support point for moving towards algebraic practices.

Cluster St-C1 is composed of students whose Success (22%) rate is very low compared to the Failure rate (63%) and whose rates on Adequate or Weakly-Adequate on three of the five local praxeologies (Calculating-Numerically, Calculating-Algebraically, and Representing) are less than 20%. Calculating Numerically is not a support for these students (82% on Under-Construction or Old levels against 16% on Weakly-Adequate). St-C1 students start the year well below school expectations.

4.2 Variation between tests 1 and 2

We performed the same analysis on S2 database. The variables that best contribute to the formation of the first factorial plane are the same. To answer RQ3 for students, we then compared how students are distributed in clusters on the two tests, and which cluster each student belongs to at the beginning and end of the year. Analogous analyses carried out on the S2 database led to three clusters, St-A2 (25%), St-B2 (35%), and St-C2 (40%) (Fig. 14).

Fig. 14
figure 14

Three clusters of similar students for Test 2 (N = 454)

The clusters for Test 1 and Test 2 are comparable across all variables (Fig. 13), but with more or less pronounced changes for certain variables depending on the clusters. The Success rate in Test 2 is higher than in Test 1 for each cluster (16 points higher for St-A2, 11 for St-B2, and 5 for St-C2). Students in St-C2 are making progress in Calculating-Numerically, with the rate on Weakly-Adequate almost doubling although still very low (31%), but not in Calculating-Algebraically_Adequate and Representing_Adequate-Weakly-Adequate. Calculating-Numerically praxeology is not always a point of support.

The composition of the clusters varies slightly between the two tests. However, St-C2 has slightly more students than St-C1 (about 40% against 35%) and the reverse is true for St-B2 and St-B1 (35% versus 39%).

According to Table 10, 45% of the students progressed without changing cluster (16% from St-A1 to St-A2, 16% from St-B1 to St-B2, and 23% from St-C1 to St-C2), that is, there are no major advances in the θ-levels they use to solve algebraic problems. Twenty percent of the students progressed with changing cluster (8% from St-B1 to St-A2, 1% from St-C1 to St-A2, 11% from St-C1 to St-B2). At the end of the year, these students belonged to clusters that had learned praxeologies closer to what is expected at the end of 9th grade. However, about 25% of students regressed and, at the end of the year, belonged to clusters that had built praxeologies that are different from what is expected (8% from St-A1 to St-B2, 15% from St-B1 to St-C2, and 2% from St-A1 to St-C2).

Table 10 Variation of the distribution by cluster of students between Test 1 and Test 2

Taking these results together, we answer RQ2, indicating that the students’ learned praxeologies fall into three clusters at the beginning and the end of the year with similar characteristics. They are consistent in terms of learning on the Adequate and/or Weakly-Adequate levels of the representative local praxeologies, Calculating-Numerically, Calculating-Algebraically, and Representing: rate above 50% for St-A1 and St-A2, rate between 30 and 50% for St-B1 and St-B2, and rate below 30% for St-C1 and St-C2 (Fig. 13). In the light of these results, we associate each cluster with a dominance described by the same level on these three local praxeologies: Adequate and/or Weakly-Adequate for St-A1 and St-A2, Under-Construction for St-B1 and St-B2, and Old for St-C1 and St-C2. Furthermore, we find that the Modeling and Proving praxeologies are not representative in the characterization of the clusters, which raises questions about the place attributed to these praxeologies in algebra teaching.

Between the beginning and the end of the school year, despite a higher success rate in Test 2, the proportion of students in each cluster remain fairly close (Table 10): A quarter of the students (St-A1 and St-A2) have learned the praxeologies expected at the end of middle school, between 35 and 40% of the students (St-C1 and St-C2) have old praxeologies, and between 39 and 35% of the students (St-B1 and St-B2) have praxeologies under construction. However, some students change cluster (Table 10), with 20% progress and 25% regress. Calculating-Numerically is the central lever for moving from cluster St-C1 to St-B2. These results are comparable to those found in work using Pépite, but they are richer because they indicate what is most representative of the students’ learned praxeologies.

5 Variations in students’ learning within the same class over the school year

We study links between variations in students’ learning and the class effect (RQ4) during one year.

To do this, we go deeper into the characterization of similar classes for Test 1, particularly to improve the description of cluster Cl-B1. For each class, we examine the distribution of students according to the cluster to which they belong.

First, we combine multivariate descriptive analyses on classes and students to compare the distribution of students in similar classes on Test 1. In this way, we identify common characteristics in the distribution of students in classes of the same cluster, which we express in terms of threshold percentages of students (Table 11).

Table 11 Class characterization for Cl-A1, Cl-B1 and Cl-C1 clusters

Two similar classes according to PCA and HAC have student distributions that follow the same trend for clusters Cl-A1 and Cl-C1 (Table 11), but with different trends for cluster Cl-B1. For the classes in this cluster, we distinguish three intervals on St-A1 and St-C1 to better characterize the proximity of Cl-B1 classes and to be more useful for interpreting learning variations.

Second, in order to further investigate the fact that similar classes do not evolve in the same way, Fig. 15 shows the variation of the distribution of students between Tests 1 and 2 within each class, distinguishing between students who progress without changing cluster (St-A1 to St-A2, St-B1 to St-B2, St-C1 to St-C2), with changing clusters (St-B1 to St-A2, St-C1 to St-B2, St-C1 to St-A2) and those who regress (St-A1 to St-B2, St-A1 to St-C2, St-B1 to St-C2).

Fig. 15
figure 15

For each class (N = 25), percentage of students progressing or regressing between the two tests, and percentage of students taking part in both tests

All but one class make progress, but not all classes show the same growth in student learning. To do this, we take into account the percentage of students present for both tests as new information about the class in Fig. 15. This one varies between 55 and 100% depending on the class. We comment on the most salient results.

The seven Cl-C1 classes have a low rate of attendance (only 52% to 76% of students passing both tests), whereas Cl-A1 classes have high rate of attendance (72% to 100% of students passing both tests). Students in Cl-C1 classes seem to be progressing better overall than those in Cl-A1 classes. Likewise, the students in two classes (1710604 in Cl-B1-2, 2307079 in Cl-C1) are all making progress, but around a third of the class is missing. As absenteeism is very high, these results cannot be interpreted in relation to the initial class composition.

Students in five of the nine Cl-A2 classes passed both tests at over 89%. Three of these classes progressed from Cl-B1-1 or Cl-B1-2 to Cl-A2, with more students in these classes progressing (from 74 to 80%) than students in the other two classes (around 70%).

Although the progression is substantial from Cl-C1 to Cl-B2, for five of the seven classes, it is important to notice that only between half and three quarters of students are present for both tests. The progress of students who change clusters in these classes concerns 25% to 56% of them. However, the regression of students from one cluster to another is much lower than in the classes from Cl-A1 or Cl-B1, which may be related to absenteeism.

We cannot interpret these results further and answer RQ4 because absenteeism does not allow us to return to the initial composition of the class.

6 Discussion

The originality of this large-scale statistical study lies in its didactic foundations, in particular the characterization of elementary algebra by five local praxeologies and the hierarchy of students’ reasoning and knowledge by four θ-levels. This reference model of elementary algebra structures the databases and allows us to define similar classes and similar students, to study their variation for a year (RQ1, RQ2, and RQ3). The statistical analyses used, PCA and HAC, reflect the quality and relevance of the data coding. They are mutually consistent and interpretable with respect to the reference model (e.g., description of clusters).

Some of our choices could be questioned. Firstly, we grouped θ-levels of the same praxeology across the analyses to address impossible to code or missing answers. This grouping is a limitation when it comes to characterizing clusters by praxeologies and the most representative levels associated with them. Secondly, we chose to build the class base by aggregating the coding of all students’ responses to all items. Another option would have been to use student clusters, taking as a variable the number of students in each student cluster in each class. This would make it possible to interpret the similarity of the classes in terms of learning and their variation regarding the composition of the classes according to the number of students per cluster at the beginning of the year.

In addition, our study reveals a methodological limitation regarding the inclusion of student absenteeism in both tests which did not allow us to answer RQ4. Limiting absenteeism turns out to be an indispensable condition for studying the dependence, if any, between the evolution of students’ learning and the class effect. These methodological comments led us to define the conditions for a new study, which could be a large-scale study on a representative sample of students and schools in France.

In addition to the composition of the class, according to student learning, we could take into account the teaching choices and practices of French teachers that influence the variation of students’ learned praxeologies. The PraescoFootnote 6 study (Content-Specific Teaching Practices) (Coppé et al., 2021a, 2021b) has shown that there are four different clusters of teaching practices. They depend particularly on whether teachers emphasize algebraic problem-solving or the technical aspect of algebraic calculation, and whether they take into account students’ productions. To fully answer RQ4, we could examine the possible links between variations in similar classes over a school year and the teaching choices of their teachers according to the clusters of practices to which they belong.