1 Introduction

A structural approach in teaching early arithmetic has been advocated as a means to help students develop powerful and sustainable ways of solving arithmetic problems. This approach entails teaching that focuses on the structure of part-whole number relations, usually with units larger than one, and involves avoiding strategies that rely on counting in single units. To teach the structure of part-whole number relations means to primarily focus on, for example, 7 as constituted of 5 and 2 (or 4 and 3) rather than its position in the counting sequence. This is necessary in order to understand and make use of powerful ideas in arithmetic problem solving, such as the complementarity of addition and subtraction, that is, that 2 + 5 = 7, 5 + 2 = 7, 7–2 = 5, or 7–5 = 2 are different representations of the same part-whole relation (Baroody et al., 2009). This is assumed to aid in solving addition and subtraction problems, since then an addition can be used to solve a subtraction task and vice versa (Wolters, 1983).

In contrast to making use of the part-whole relations of the numbers in the task, a task (e.g., 32–25=?) may be solved by students by counting backwards in single units, a strategy that is particularly error-prone in higher number ranges (Ellemor-Collins & Wright, 2009). Thus, the recognition of how numbers relate to other numbers and how numbers can be seen as units within units (25 + 5 + 2 = 32, 5 and 2 are parts of 7, which in turn is part of the minuend 32 in the example above) is essential in the early development of arithmetic skills (Björklund & Runesson Kempe, 2022). However, studies have shown that young students are reluctant to change their counting strategies from those they have previously learned, even when they are taught more efficient ways of handling numbers as part-whole relations (e.g., Cheng, 2012; Ellemor-Collins & Wright, 2009; Hopkins et al., 2022). It is therefore important for early arithmetic education to cover the relational nature of numbers and what this implies for what can be done with numbers in arithmetic problem solving, in order to pre-empt a narrow repertoire of arithmetic strategies.

What to teach, and suggestions for which strategies to promote, is an issue dealt with in many studies (e.g., Blöte et al., 2000; Van der Auwera et al., 2023). Using numbers’ part-whole relations as a means to solve arithmetic tasks is assumed to be powerful, but there has not yet been sufficient investigation and conceptualization of how to teach this in order for students to learn it in a sustainable way. For example, it has also been observed that students rarely use powerful strategies even when they are known to be capable of using advanced strategies for problem solving (Heinze et al., 2009; Selter, 2001; Verschaffel et al., 2010).

It is possible, however, that efficient and sustainable learning is promoted not so much by what arithmetic strategy is taught, but rather by how the student experiences arithmetic tasks as structures based on part-whole relations. Thus, there is a difference between teaching specific strategies and teaching a way to experience numbers as structured parts and whole. The study we present in this paper is based on this theoretically driven approach, that developing arithmetic skills in the early years is grounded in ways of experiencing numbers and what can be done with numbers (Neuman, 1987), where in particular experiencing numbers as structured in part-whole relations induces sustainable arithmetic strategies (Kullberg et al., 2020). To clarify the long-term effects of the structural approach teaching, as an alternative to a counting-based approach, we implemented an intervention program based on the idea of using a structural teaching approach as a basis for learning addition and subtraction, with particular emphasis on part-whole relations and the base-ten unit. The program was implemented in four first-grade classes in Sweden during one school year. The research questions answered in this paper are: (i) Does a structural teaching approach in first grade increase students’ skills in solving addition and subtraction tasks in comparison to a control group? (ii) How does a structural approach affect students’ ways of solving addition and subtraction tasks? Through planned activities that used finger patterns to show static part-whole relations, the students were expected to experience the structure of number relations (for example, experiencing 6/4/10 on their fingers) and to learn to make use of such a structure in arithmetic problem solving.

2 Learning to solve addition and subtraction tasks

The idea of emphasizing numbers’ part-whole relationship is not new; nearly a century ago, Brownell (1935) was already arguing that arithmetic should be taught on the basis of relationships between numbers. However, as Baroody (1987, p. 10) points out, “the basic number combinations are not simply a basket of isolated facts.” Developing number sense in children requires emphasizing the important mathematical relationships reflected in the basic number combinations. Baroody (1987, p. 9) further notes that “The essence of knowledge is structure: elements of information connected by relationships to form an organized and meaningful whole.” Baroody’s statement seems to concern knowledge in general, but clearly relates to knowledge of numbers, considering that experiencing numbers as connected parts liberates more powerful arithmetic operations than merely learning combinations as isolated facts. These powerful strategies include decomposition (c = a + b), commutativity (a + b = b + a), and the complement principle (if a + b = c then c–a = b) (Zhou & Peverly, 2005), which all build on an understanding of the part-whole relations of the numbers in the task.

Piaget (1952) states that “[a]dditive and multiplicative operations are already implied in numbers as such, since a number is an additive union of units, and one-one correspondence between two sets entails multiplication. The real problem, if we wish to reach the roots of these operations, is to discover how the child becomes aware, when he discovers that they exist within numerical compositions” (p. 161). Empirical research has shown that this discovery of numbers’ part-whole relations and how to operate with them, especially when bridging through ten, is not easily done by young students (Björklund, 2021). As a starting point, observations show that a substantial number of students frequently use counting strategies instead of retrieval-based strategies for simple addition (Hopkins et al., 2022). Counting strategies are indeed often successful when the number range is small, but when the number range increases and the numbers and their differences are larger, counting strategies based on single-unit operations generally fail (Björklund & Runesson Kempe, 2022; Ellemor-Collins & Wright, 2009).

Selter (2001) concluded, in a study on three-digit addition and subtraction, that many students appear to be “blind” to the relations between given numbers in a task, and therefore use the same methods and strategies regardless of the task. He argues that students’ sense for number relations does not seem to develop independently of instruction. Therefore, students should be encouraged to consider the nature of the problem before trying to solve the problem. This implies that identifying numbers as relational to one another might be successful because it opens up the possibility for students to identify the structure of a task and then to solve it, for example by using addition to solve a subtraction task (Kullberg et al., 2024). This strategy is considered to be a powerful and sustainable way of completing arithmetic tasks, as it builds on conceptual understanding of numbers’ part-whole relations. However, there are very few reports of students using it (e.g., Heinze et al., 2009; Selter, 2001) and the scarce use of retrieval-based and structure-based strategies among students has been explained in terms of a lack of understanding of the underlying complement principle between addition and subtraction. This hinders their discovery and use of the subtraction by addition strategy and other structure-based ways of reasoning (Torbeyns et al., 2009).

Young students’ ways of experiencing or “seeing” the part-whole relation in a task have recently been shown to be related to their developing arithmetic skills. Those who experienced numbers represented both as one set (e.g., a finger pattern of five fingers on one hand) and as a composed set (e.g., two fingers on one hand and three on the other) were more likely to develop known number facts from a long-term perspective (Kullberg & Björklund, 2020). Thus, solving arithmetic tasks in powerful ways does seem to encompass more than making use of certain strategies or discovering how numbers constitute an additive union of units (Piaget, 1952). It seems also to include a way of experiencing the task and numbers in the task as relational.

Based on empirical studies in which general structural features have been implemented in tasks that require children to engage in structural thinking with a view to generalization, Mulligan and Mitchelmore (2013) claim that students’ awareness of mathematical structure is key in mathematical development. In this view, structuring is a significant part of arithmetical thinking, in that it involves paying attention to properties and relationships between properties, rather than seeing structure as an isolated mathematical object. A structural approach in teaching and learning is further theorized by Venkat et al. (2019) who unpack the concept of mathematical structure and its significance for arithmetic learning: students who experience structure are oriented towards the local mathematical relationships found between elements. This may induce a further orientation also to general properties within, for example, a class of elements or examples. In order to direct students’ awareness to such structural features, teaching ought to highlight relationships between elements through spatial arrangements or notational means.

If the child’s conception of number is characterized by numbers being measured only in their smallest single units, neither local nor general structures can be experienced (Neuman, 1987). This leads to the child relying on counting strategies and working only with single units rather than composing larger units that are related to other units. Children who only experience numbers as single units then end up having to use their fingers to keep track of the units they have counted, and part-whole relationship of numbers remains unseen. In contrast, Neuman advocates for a structuring way of using fingers in patterns to compose units and to see units within units; for example, a finger pattern “seven” is composed of one whole hand “five” and two more fingers on the other hand. Ahlberg (1997) also concludes from empirical studies that “[w]hen children handle numbers by structuring they do not count on the number sequence in order to keep track of the numbers, but rather structure the numbers in the problem in parts and the whole in order to arrive at an answer” (p. 70). Finger patterns may then be used as a means for structuring numbers, represented in a way that emphasizes part-whole relations due to the undivided five (the whole hand). The child thus learns to “see” numbers as part-part-whole relations (Brissiaud, 1992).

Interventions with younger students have shown that it is possible for them to learn to understand numbers as constituting a part-whole relationship. Moreover, such learning is necessary if they are to develop an awareness of general relationships that can lead to mathematical structure in a more advanced sense, which in turn allows them to make use of strategies such as the complement principle with general applicability. Cheng (2012) suggests, for example, that students should encounter multiple classification tasks that reflect the complex relationships among numbers (5 can be partitioned as 4 and 1 but also 3 and 2), because these help them to learn to understand quantity as part-whole relationships in a more systematic and thorough way. In this way, students learn to see numbers as structural phenomena.

The brief review above of the meaning and role of a structural approach for arithmetic learning gives sufficient support to implement such an approach in primary mathematics education. When implementing new ideas and approaches in education, it is of course of great interest to evaluate the effects of the interventions. The overview given above of what it means to learn arithmetic skills makes it clear that it is not sufficient to assess the frequency of correct and incorrect answers. Based on the conjecture that it matters how a student approaches arithmetic tasks, this will also be the key in evaluating the learning outcomes and, more specifically, the extent to which the students have learnt to enact a structural approach in arithmetic problem solving in the long-term.

3 The intervention program

The intervention program was based on findings from previous intervention studies on early arithmetic learning (Björklund et al., 2018, 2021; Kullberg et al., 2020; Neuman, 1987, 2013) theoretically grounded in variation theory (Marton, 2015; Marton & Booth, 1997). The study builds in particular on findings from one intervention with 4- to 5-year-olds (Kullberg et al., 2020), where we found that the children in the intervention group were more successful than the control group in solving tasks in the number range 1–10. One year later, they were still successful in solving tasks in the number range 1–10 using part-whole relations, but did not use this knowledge (decomposing and composing numbers) for solving tasks involving bridging through ten, e.g., 15–7=? (Björklund, 2021). The present study was therefore designed to identify what was critical for first graders in learning to solve tasks bridging through ten using part-whole relations. Variation theory contributes a particular view in which learning is seen as a change in one’s experience by discerning aspects or relations between aspects that were not discerned previously, and thereby perceiving the world in a more differentiated way (Gibson & Gibson, 1955). The theory also states that this discernment presupposes an experience of variation. If an aspect is opened up as a dimension of variation, it is likely to be made visible to the learner. This theory thereby serves as a suitable framework for designing teaching and studying learning in commensurable terms.

3.1 Theoretical and empirical background

One of the cornerstones of the intervention program was findings from earlier research by Neuman (1987, 2013) on arithmetic learning among 7-year-olds. Neuman demonstrated that when solving a task such as 2+?=9, many children had to count on from two in single units while at the same time keeping track of their counting (e.g., by raising one finger for each number), whereas others used their hands to visualize the structure and relations between the numbers. The latter group started by showing the whole – nine fingers – with both hands (five on one hand, four on the other), then folded two fingers (of the four), looked at their hands, and said: “seven.” Neuman concluded that this way of making numbers visible via finger patterns provided (i) a possibility for the children to experience the numbers as a part-whole relation, and (ii) support for prioritizing children’s experience of numbers over memorization and practicing of arithmetic skills.

A study of 4- to 5-year-olds (Björklund et al., 2018, 2022; Kullberg et al., 2020), who had not yet formally been taught arithmetic, demonstrated a spontaneous use of finger patterns (86%) when solving a simple subtraction task. Furthermore, the success rate (71% correct answer) was associated with experiencing the task as a part-whole relation and showing structured finger patterns. These results confirmed Neuman’s findings and gave a solid ground for ascertaining that learning number part-whole relations by finger patterns could be a pathway to arithmetic learning.

However, it was noted that this presupposes the simultaneous discernment of the cardinal and ordinal aspects of numbers. Hence, just experiencing the visible part-whole structure in the finger pattern is not sufficient; there are aspects of numbers which must be discerned if the student is to be able to experience the number relations and find the missing number in a task. This idea of discernment of critical aspects for learning the first ten numbers was further developed by Björklund et al. (2021). To experience numbers in a more powerful way, they argue, there are aspects (besides cardinality and ordinality) which it is critical to discern simultaneously: part-whole relation, commutativity, and the inverse relation between addition and subtraction. When learners fail to solve an arithmetic task or are unable to use certain strategies, this is due to not having discerned all the critical aspects (Björklund et al., 2021). These critical aspects have been further developed and specified in regard to a higher number range, and are embedded and manifested in the activities in the present intervention program (see Table 1).

Table 1 Critical aspects embedded in activities enacted by means of invariance/variation in the intervention

Variation theory states that when an aspect is discerned, it is experienced as a dimension of variation; that is, as something that can vary. So, for instance, in order to discern the part-whole aspect of a number, it must be experienced as a composition of pairs of other numbers (e.g., 7 = 3 + 4, 7 = 5 + 2, 7 = 1 + 6). However, to learn this requires more than just practicing and remembering the combinations (number bonds). Instead, the learner must have the possibility to experience how these numbers are related (Kullberg et al., 2020).

Principles from variation theory – critical aspects, discernment, and variation/invariance – along with empirical findings from previous research framed by these principles, were the fundamentals in the intervention program (Table 1). They also guided the teachers and researchers when designing the activities in the intervention. The teachers were instructed to direct students’ attention towards the critical aspects during the lessons by enacting the patterns of variation and invariance shown in Table 1. For example, in the activity Partitioning numbers in two and three parts, there were several critical aspects (Numbers can be partitioned, Numbers can be represented in different ways, Seeing parts in parts) that should be brought to the fore by the teacher for the students to experience. In the activity, the number 12 is invariant to allow the students to see different parts of the same number (varying). Partitioning 12 into three parts (e.g., 5/5/2; 5/4/3) and into two parts (e.g., 10/2; 7/5) made it possible for the students to experience different ways of partitioning the same number and thereby discern the structure of part-whole relations. The following excerpt illustrates two students working with the activity.

The students are discussing their second worksheet (to the left in Fig. 1), which involves working on partitioning 12 into 3 parts and next composing them into two parts. The students have partitioned 12 into 4/3/5. Then they composed 4 and 3 to make two parts of 12, i.e., 7 and 5. The teacher comes to the table.

Tim: First I had 10 and she had 5, then it was 3 [i.e., three was taken away from Tim’s 10 fingers to make 7]. She still had 5 [5/2/5].

Lisa: Yes, it is correct. Look here. That 3 [the remaining 3 on Tim’s right hand that Tim talked about] and 2 here [shows 5 as 3 on Tim’s right hand and 2 on Lisa’s left hand, 7 + 3 + 2 = 12. Lisa’s 5 was decomposed into 3 (on Tim’s hand) + 2 (on Lisa’s)]. 12 is 2 more than 10.

Teacher: Exactly, I understand what you mean. Then you will get one more worksheet [to do another partitioning of 12].

Fig. 1
figure 1

A pair of students work with the activity Partitioning numbers into two and three parts. The students show 7+(3 + 2) = 12 using finger patterns

4 Method

The study had a quasi-experimental design and included an intervention group and a control group. In this paper we report results from analyzing student interviews at three time points to investigate short term and long-term effects on learning outcomes.

4.1 Participants in the intervention and control groups

The intervention group comprised students taught by four experienced teachers from three different schools in the same municipality near a larger Swedish city (middle socioeconomic status with some immigrant students). The teachers met the researchers every two to three weeks for 1.5 h over a period of eight months to plan, analyze, and revise lessons in the intervention. The researchers and the teachers had different roles in the intervention. The critical aspects and the theoretical principles were introduced to the teachers by the researchers. How these should be manifested in class by means of variation embedded in activities was jointly discussed and decided. The lesson plan had to be adapted to the particular context and the teachers’ knowledge about their students. The teachers enacted the collaboratively planned lessons in their classes, and video recorded them in order to make it possible to analyze and discuss the enactment of the critical aspects. Each teacher had a project journal in which they wrote notes about their teaching as a support for the discussions.

The control group comprised students taught by three experienced teachers from two other schools in two other municipalities (35 students had written consent to participate). The control group was chosen due to the schools being situated in school districts having a similar socioeconomic status to that of the intervention group. One school was situated in the center of a larger Swedish city, and the other was in a municipality near the larger city. These teachers used their ordinary plan for teaching. The researchers met one of the teachers four times and had two joint meetings with the two others (who were co-teaching one lesson in their two classes every week) to discuss video-recorded lessons of their teaching about addition and subtraction. Two meetings were cancelled by the two teachers due to workload. The meetings used stimulated recall, where the teachers made comments about the teaching and could stop the video and explain why the lesson was taught in a particular way. Hence, the video-recorded lessons were used to gain insight into how the topic was taught in the control group. The analysis of the recorded lessons showed that the teachers did not use a structural approach in the recorded lessons, and the use of single-unit counting was frequent. This was shown by, for example, teaching students to count in single steps on a number line in addition and subtraction, and students solving tasks in the textbook using this method. In the textbooks, tasks about part-whole relations existed but were less frequent. Note that the Swedish National Curriculum is goal-oriented, and hence it just describes the expected learning outcomes without prescribing any particular teaching approach. The teacher has space and freedom to choose the teaching approach, including whether to teach with or without using a textbook.

In both the intervention group and the control group there were about 25–30 students per class. Students in the intervention and control group had participated for one year in a preparatory class (pre-school class), which is compulsory in Sweden. All participating teachers in both groups and the legal guardians of the students who chose to participate provided their signed written consent for participation.

4.2 Interviews

This paper reports the results of analyzing 363 individual student interviews (intervention group N = 86, control group N = 35) from three points in time: before (Interview 1), immediately after (Interview 2), and one year after the intervention (Interview 3). All classes, except one (N = 23) in the intervention group, were taught by the same teachers in second grade. Each individual interview lasted for 20–30 min and was video-recorded. The type of task-based interview conducted had been tested with positive outcomes in a previous study (Kullberg et al., 2020).

The assessment was a mix of orally presented story problems (8 + 5=?, 15–7=?, 6+?=13, 24–?=15, see Appendix 1), and items with numerals (11 = 5+?, 6+?=13, 16+?=23, 14–?=6). Follow-up questions were posed to the students on all items; for example, “How do you know it [the answer] is x?” and “Please show me what you did when you solved the task.” The items were the same in all three interviews, except that in Interview 3, additional items with numerals in a higher number range (28 + 44=?, 83–7=?, 204–193=?, 204–12=?, 132–78=?) were used to see whether the students were able to solve the problems by structuring. No manipulatives or pen and paper were used when solving the problems but the students were allowed to use their fingers.

The interview data were analyzed in the following ways. The first analysis concerned frequencies of correct and incorrect answers (coded 1 for correct and 0 for incorrect, see Figs. 2 and 3), and the second analysis combined correct and incorrect answers with how the student solved the task (see Table 2; Fig. 4).

Table 2 Coding system for the second analysis

For example, if a student counted backwards, saying “83 minus 7, that’s 82, 81, 80, 79.…”, the solution was coded as single-unit counting. Conversely, if a student used parts of numbers larger than a single unit, for example saying “I take 83 minus 3 from 7, mmm, is 80. 80 minus 4, is 76”, this was coded as structure. In some cases, it was not possible to code how the student solved the task; this was coded as “No code.” Due to ethical concerns, we stopped giving more tasks when a student gave two consecutive incorrect answers in the same category of tasks. When a student was not given a question, we coded this as “No question”. The coding process for this qualitative analysis included the following steps: first, the research group made the coding scheme and discussed examples of coding. After this, three researchers coded the items for different classes, and cases that were difficult to code were discussed within the research group. In a final step, all coded items were double-checked for errors by one of the researchers, resulting in only a small number of observations being recoded (2 observations of 121 were recoded regarding 83–7=?).

5 Results

The results are presented in three parts. First, we present the comparison of pre-intervention results for intervention and control groups to establish the equivalence of the groups. Then, we compare the intervention and control groups in terms of the addition and subtraction tasks used in all three interviews. We then report the frequency of correct answers on the tasks in a higher number range used in Interview 3. Finally, we focus on how the students solved the tasks in the third interview (second grade).

5.1 Comparison of intervention and control groups

We calculated the total score (total number of correctly solved tasks) for each student in each interview. This was done for eight tasks that were the same in the three interviews, so each student could have a result ranging from 0 to 8 in any of the interviews. Reliabilities for the total score were α1 = 0.721, α2 = 0.704, and α3 = 0.625, for Interview 1, Interview 2, and Interview 3, respectively. These moderate reliabilities, estimated as internal consistency, are to be expected considering the total score consists of dichotomous items of heterogenous difficulty. As Cronbach’s alpha index is estimated based on inter-item correlations, its size is sensitive to the fact that tasks are created to be of heterogenous difficulty. Then we compared the intervention and control group within the same interview. The intervention (M = 2.14, SD = 1.95) and control group (M = 2.49, SD = 1.90) were equally successful (t(119) = 0.891, p = .375, with equal variances F = 0.114, p = .736) on the total score for eight addition and subtraction tasks in the number range 1–20 in the pre-intervention assessment (Fig. 2, Interview 1). The comparison on the task level revealed the same results. The results of a chi-square test showed that the two groups did not differ on any of the eight tasks prior to the intervention (χ21 = 0.740, p1= 0.390, χ22 = 0.291, p2= 0.590, χ23 = 1.00, p3= 0.317, χ24 = 0.767, p4= 0.381, χ25 = 283, p5= 0.595, χ26 = 0.474, p6= 0.491, χ27 = 0.240, p7= 0.624, χ28 = 0.124, p8= .723). These results indicate that the groups were comparable and hence suitable for further analyses.

At the immediately post-intervention assessment (Interview 2), the intervention group showed a higher mean (M = 5.56, SD = 2.03) than the control group (M = 4.86, SD = 2.24). The difference between the groups persisted at the one-year assessment (Interview 3), where the intervention group showed a mean of 6.83 (SD = 1.56) and the control group a mean of 6.14 (SD = 1.52).

The learning outcomes for both groups increased between the three assessment interviews. In order to test the effectiveness of the intervention, we conducted a mixed ANOVA analysis (using version 28.0 of IBM SPSS Statistics), with interview (Interview 1 vs. Interview 2 vs. Interview 3) as within-group factor and group (intervention vs. control) as between-group factor. Mauchly’s sphericity test showed that the sphericity assumption was not violated (W = 0.059, p = .083), and Levene’s test indicated that variances were homogeneous for all levels of the repeated-measures variable. The main effect of group was not significant (F(1,119) = 1.350, p = .248, ηp2 = 0.011) and the main effect of interview was significant (F(2,238) = 232.936, p < .001, ηp2 = 0.662), suggesting that the groups did not differ on average and that the increase in outcome over time was significant. Most importantly, the interaction (Fig. 2) between group and interview was significant (F(2,239) = 4.579, p = .011, ηp2 = 0.037), showing that the profile of change in outcome differed between the control and intervention groups, with the intervention group showing a greater increase over time in comparison to the increase seen in the control group. This shows that the intervention had a positive effect on the results of the intervention group.

To better understand this interaction, we used contrast analysis to compare students’ results in consecutive assessments across the control and intervention groups (Fig. 2). Significant interactions were found when comparing the control and intervention groups between Interview 1 and Interview 2 (F(1,119) = 6.089, p = .015, ηp2 = 0.49), but the further interaction effect between Interview 2 and Interview 3 was not significant (F(1,119) = 0.003, p = .959, ηp2 = 0.000). These results clearly show that during the intervention period the intervention group progressed faster than the control group, and that the difference between the groups remained steady over the course of one year after the intervention. In other words, the advantage that the intervention group gained during the intervention was sustained during the follow-up period. Both groups became better at solving the tasks during the follow-up period, but the control group did not catch up with the intervention group; instead, the intervention group remained more successful. Considering the rather long follow-up period, these results show that the intervention had long-lasting effects on the students’ performance.

Fig. 2
figure 2

Average number of correctly solved tasks among the intervention and control groups across three interviews (Kullberg et al., 2022)

5.2 Item-level analysis

To further study the long-term effects of the intervention, we analyzed the tasks in Interview 3 individually, allowing us to see whether the items using a higher number range produced different results (Fig. 3). The intervention and control groups showed only a small difference in percentage of correct answers to the items 8 + 5=? and 15–7=?; both of these items are straightforward tasks in a lower number range that were used in all three interviews. However, when the number range increased, a larger proportion of the intervention group was able to solve the more difficult items in the higher number range. The largest differences were found on the subtraction items, 32–25=? and 83–7=?, which were solved by 51% and 73% of the students in the intervention group compared to 31% and 49% of the control group. We also found that the intervention group solved two addition items, 15 + 17=?, 28 + 44=? (78%, 73%), and two subtraction items, 204–193=?, 132–78=? (35%, 22%), more successfully than the control group (where the results were 60%, 51% and 17%, 6% respectively).

Fig. 3
figure 3

Percentage of correct answers on orally presented story problem tasks (first five items) and tasks with numerals (last five items) in Interview 3 (Kullberg et al., 2022, p. 88)

In addition to analyzing the results at item level, we calculated the total score for items in the higher number range in Interview 3. This score (α8 tasks = 0.769) was calculated as the total number of correct answers on eight difficult tasks (24–?=15, 15 + 17, 32–25, 28 + 44, 83–7, 204–193, 204–12, 132–78). An independent t-test suggested that the intervention group (M = 4.30, SD = 2.45) was significantly more successful than the control group (M = 2.75, SD = 2.12) in solving these tasks after the follow-up period (t = 3.464 (129), p < .001; d = 0.657). The difference in solving tasks in a number range > 16 suggests that more students in the intervention group had learned how to handle such tasks.

5.3 Analysis of student solving

We also analyzed the strategies used for solving the items. Although the success rate for the items 8 + 5 and 15–7 was high in both groups (Fig. 3), the analysis showed a difference in the strategies used for solving the tasks. In the control group, around a third of the observations of solving 8 + 5 were coded as single-unit counting, while in the intervention group this was much less common; only 7% were coded as single-unit counting. When calculating 15–7, none of the correct solutions in the intervention group were coded as single-unit counting, whereas in the control group about a third was coded in this way. Hence, observations coded as both correct and structuring were more common in the intervention group in the number range < 16.

Next, we analyzed the data with a focus on similarities between strategies used in different number ranges. We chose a single item that was similar to those mentioned above but in a higher number range (83–7=?), and examined differences between the groups (Fig. 4). More than 60% of the students in the intervention group used a structuring strategy to solve the task (e.g., 83–3 = 80, 80–4 = 76) and ended up with the correct answer, compared to about 30% in the control group. Hence, on this item, structuring was twice as common in the intervention group. These results suggest that there were similarities in the strategies used for solving two subtraction problems that were similar to each other but in different number ranges. Structuring as a strategy (and correct answer) was found to a greater extent in the intervention group than in the control group, in both a lower and a higher number range. One student (Philip, intervention group) solved 83–7=? in the following way. He said: “I take 83 minus 3 from 7, mmm [Looks at the 7 fingers he is holding up, showing 5 fingers on one hand and 2 on the other], is 80. 80 minus 4, is 76”. Philip most likely saw the 3 and the 4 within 7 and used that to solve the task involving passing the ten. Another student (Emma, intervention group) said: “76. I take the 7 and make it an 8, and then take the 3 so, 83 minus 3, and then you have a 5 left [of the 8] but make it into a 4 [since it was 7 and not 8], so 80 minus 4 is 76”. Emma seems to structure 8 into parts of 5 and 3, since she probably thinks it is easier to see parts of 8, while also being aware that she should subtract 7 and not 8, and therefore subtracts 80–4 = 76. Both groups had some observations on the item 83–7=? that were coded as single-unit counting and correct answer. However, this was twice as common in the control group (17%) compared to the intervention group (9%).

The difference between the groups as regards structuring versus single-unit counting in a subtraction in the < 16 number range also seemed to exist for an item in a higher number range. Since structuring was found to be more common among the intervention group in a lower as well as a higher number range, the results suggest that the intervention also influenced how the students in the intervention group solved tasks in a higher number range.

Fig. 4
figure 4

Students’ ways of solving (using structure or single-unit counting) 83–7=? in Interview 3 shown in percentage (Kullberg et al., 2022, p.88)

Furthermore, single-unit counting did not seem to always be successful in a higher number range. As seen in Fig. 4, 20% of the control group and less than 5% of the intervention group used single-unit counting and failed to solve the task 83–7=?. One student (Mike, control group) counted backwards saying: “Hmm, 82, 81, 80, 79, 78, 77, 76, 75.” When he counted, we looked at his hands, where he was showing 5 fingers on one hand and 3 on the other hand (excluding the thumb and little finger). It is possible that he made a mistake by showing 8 fingers instead of 7.

6 Discussion and conclusions

In this study, the long-term effects of an intervention program were investigated. In the intervention, the teaching was based on a structural approach and results from previous research, regarding what students need to discern in order to develop an understanding of part-whole relations of numbers, and systematically designed based on principles of variation and invariance to make this discernment possible. The study shows that the intervention had a positive effect on the results of the intervention group, hence affirmatively answering the research question: (i) Does a structural teaching approach in first grade increase students’ skills in solving addition and subtraction tasks compared to a control group? We would conclude that the intervention contributed to the students, one year later, being able to solve tasks in a higher number range that had not been included in the intervention. The results demonstrated a significant difference in that regard compared to a control group. We found that the students used a structuring strategy on tasks in the higher number range to a greater extent than the control group, hence answering the research question: (ii) How does a structural approach affect students’ ways of solving addition and subtraction tasks? In the intervention group, more than 60% of the students used structure to solve the task (83–7=?) and answered correctly, whereas only about 30% in the control group did so. This improvement is explained by the fact that students in the intervention group had the opportunity to discern the empirically identified critical aspects for solving the addition and subtraction tasks using a structural approach. When encountering items in a higher number range (three-digit numbers), students in the intervention group seemed to be able to generalize what they had learned (i.e., critical aspects such as number relations, decomposition of numbers, and using ten as a benchmark) in a lower number range.

Students who solve the task using single-unit counting most likely do not experience addition and subtraction in the same way as students who are able to use structure to arrive at the answer (see Björklund & Runesson Kempe, 2022). A student who uses single-unit counting has, from the theoretical perspective taken in this study, most likely not yet experienced addition and subtraction as part-whole relations and has not grasped that the subtrahend can be partitioned and ten used as a benchmark. Hence, the theoretical framework is beneficial for addressing which critical aspects need to be discerned, and which ones a particular student has not yet discerned, to solve tasks using a structural approach.

The results confirm previous findings of improvement on solving addition and subtraction tasks from previous intervention studies with younger students (4–5 years) on teaching a structural approach from the outset, guided by the variation theory of learning (e.g., Kullberg et al., 2020). The present study adds new knowledge by showing the effect of the strategies students used for solving tasks involving bridging through ten and how finger-based learning as an element in a structural teaching approach may contribute to developing strategies where finger patterns cannot be used. Our study shows an increase in student learning on all task types used (several tasks indicate this), and also on tasks in a higher number range than was used in the intervention. This differs from findings from a previous intervention study (Wolters, 1983) emphasizing part-whole relations, which found that first and second graders’ learning increased most on missing addend problems or partitioning of a whole (e.g., ?– 9 = 8, 15= ?+8).

Our findings contribute to the debate on whether a counting approach or a structural approach is most beneficial for young students’ early arithmetic learning. Previous research, dominated by cognitive science (Baroody, 2016; Fuson, 1992; Steffe, 2004; Steffe et al., 1988), has argued that young students learn addition and subtraction through acquisition of basic counting strategies. This presumption has been debated in later years (Cheng, 2012; Coles & Sinclair, 2018; Hopkins et al., 2022; Neuman, 2013), with researchers suggesting that counting as an initial arithmetic strategy may instead hinder students’ ability to solve tasks in higher number ranges (Cheng, 2012; Hopkins et al., 2022).

In this study, the teachers and researchers worked in collaboration with the development of activities used to elicit the critical aspects, which most likely increased the quality and relevance of the intervention (Bulterman-Bos, 2008). Together with the researchers, the teachers designed the activities based on principles from variation theory to make the critical aspects discernible. They discussed each activity from the point of view of which critical aspects should be elucidated and what should be varied and invariant for the critical aspect to become noticeable for the students. The teachers had the opportunity to collaboratively analyze the teaching by means of video-recorded lessons and refine the enactment of the activities. The activities were relatively few (10 during a whole year) and were enacted several times (to make it more likely for students to experience the critical aspects). This most likely helped the teachers to refine the enactment of activities. The teachers teaching the control group did not have the same opportunity to discuss and refine their teaching, which may have affected the quality of teaching; neither were they familiar with how variation theory can guide lesson design. In studies like these, the teacher effect cannot be neglected. There may be positive effects solely from additional teacher commitment. However, the kind of effects compared to the control group are significant. The small size of the control group was caused by the fact that several students in the control classes did not hand in written consent to participate in the study, which may have affected reliability. A limitation in this study is that only one of the tasks in a higher number range was analyzed according to the strategy used (83–7=). This task was selected since it is a two-digit number subtraction task requiring bridging through ten, which we expected second-grade students to be able to solve. If more tasks of similar kind had been analyzed, we could possibly have drawn more general conclusions about students’ ability to apply a structural strategy. What we have been able to show, however, is that the intervention group did not apply single-unit counting to the same extent as the control group. We infer from this fact that the students preferred to use a structural strategy for solving this type of task. We are also aware that the fact that the first round of coding was carried out by different researchers may have threatened reliability, but there were only a few discrepancies when one person in the research team double-checked the coding on all items.

Our findings have implications for teaching in early grades and the writing of mathematics textbooks, since they show that using a structural approach combined with principles from variation theory from the outset when teaching arithmetic to young students may be more successful than starting with a counting-based approach. It has been argued that the strategies that are learned first have a tendency to be used later as well (Cheng, 2012). If students are taught to structure numbers to solve addition and subtraction tasks in first grade (or earlier), this may be helpful for learning to solve tasks in higher number ranges later and may reduce the number of students using single-unit counting in higher grades (Ellemor-Collins & Wright, 2009).