1 Introduction

In the practice of mathematics, the use of inscriptions, i.e., representations that exist in material form (Roth & McGinn, 1998), is essential. Without such material representations, it is virtually impossible to acquire a mathematical understanding (Dörfler, 2008; Goldin & Shteingold, 2001). Accordingly, representation is now normatively set as a mathematical competence in the standards and curricula of many countries (e.g., NCTM, 2000). This competence is often associated with problem solving. Furthermore, in research, the flexible and adaptive use of different representations is considered essential for mathematical problem solving (Heinze, Star, & Verschaffel, 2009). The generation of a graphic representation is regarded as an important heuristic (Hembree, 1992). At the same time, it is often reported that many learners rarely use graphic representations as heuristics (Fagnant & Vlassis, 2013; Lopez Real & Veloo, 1993). This is particularly evident at the primary school level, where interventions involving graphic representations for problem solving often have almost no positive effects (Hembree, 1992).

Thus, a tension between mathematically and didactically motivated ideas on the one hand and the use by learners at the primary school level on the other hand becomes apparent. In intervention studies, graphic representations are mostly examined as instruments for problem solving (e.g., Van Essen & Hamaker, 1990). There is a gap in the research on the question as to how learner-generated graphic representations develop when they are created as documents for others and reflected upon in class. The study described here makes a contribution in this direction.

2 Theoretical background

2.1 Graphic representations for word problems

Various forms of representations can be distinguished (Goldin & Shteingold, 2001). A distinction between descriptive and depictive representations (Schnotz, 2002) seems to be useful in the case of graphic representation for word problems. We will see below that word problems are a particular example of descriptive representations, while graphic representations form a subset of depictive representations.

Descriptive representations consist of symbols and are associated with the content they represent by means of a convention. Descriptive representations contain relational characters for structural mapping (Schnotz, 2001). For example, texts or mathematical equations are descriptive representations. For describing something in texts, nouns are put by verbs and prepositions in relation to each other. In contrast, the signs used in depictive representations “are associated with the content they represent through common structural features” (Schnotz, 2002, p. 103). Depictive representations have structural properties that correspond to properties of the facts to be presented. In contrast to descriptive representations, information can be taken directly. Thus, they enable efficient conclusion processes and are particularly well suited to producing inferences in the observer, which is important for problem solving (Larkin & Simon, 1987; Schnotz, 2001).

Word problems are forms of descriptive representations. They can be understood as tasks presented in text form, where the content is largely meaningless and interchangeable (Schipper, 2009). The focus is on the verbally described mathematical relationships. Figure 1 shows an example of a word problem. Semantically different word problems can describe the same mathematical relationships and lead to the same mathematical operations (Verschaffel, Greer, & De Corte, 2000). Examples used in this study can be found in Table 6 in the Appendix. A change of the linguistic surface structure can influence the degree of difficulty. If, for example, the information in the text is mentioned in the order necessary for processing, this has a positive effect on the solution process (Stern, 1998). Word problems are often criticized (Verschaffel et al., 2000): One main criticism is that word problems that are used in classroom practices often do not have any genuine references to reality, but are artificial problems. At the same time, it is reported that students often “solve these problems in a stereotyped and artificial way” (Verschaffel et al., 2000, p. 12). However, depending on how they are used in teaching, word problems also have the potential to develop several mathematical competences (Verschaffel et al., 2000). For example, they can be used to “develop new mathematical structures, notations, etc in the course of exploring the modeling of phenomena” (Verschaffel et al., 2000, p. 173).

Fig. 1
figure 1

Graphic representations (with translation) and the associated word problem (Ott, 2016)

Graphic representations are forms of depictive representations. The term graphic refers to representations consisting of lines and dashes (Cox, 1999). Sketches and drawings can be understood as graphic representations. In this study, graphic representation refers to paper and pencil representations. Graphic representations are characterized by aspects of space that are mapped onto content elements (Stern, Aprea, & Ebner, 2003). In graphic representations, therefore, due to the position of the individual elements, relationships become directly apparent (Larkin & Simon, 1987).

In a preliminary study (Ott, 2016), three key features of graphic representations for word problems were identified: mathematical structure, mathematical matching, and degree of abstraction. A mathematical structure may be defined based on set theory (Rinkens, 1973): Relationships between amorphous elements of a set can be determined by defining linkages on the set. A structure is thereby imposed on the set. In word problems, information is presented with quantities and nouns which are related to each other by verbs and prepositions. The word problem is thus given a mathematical structure. For a graphic representation, it is necessary to invent signs for objects, e.g., quantities and nouns, that shall be represented. Relationships between these signs are determined by the arrangement of the signs on the sheet. Since the mathematical structure is defined on the signs of these objects, they are called structurally relevant objects. Figure 1 shows six examples of graphic representations. While in a, c, d, e, and f trees or rulers are structurally relevant object signs, in b, these are circles as elements of a set. a, b, and c show the mathematical structure of a linear equation in the arrangement of the signs for the structurally relevant objects. d only shows the relationship between 15 cm and 1 year. e and f only show structurally relevant objects, e.g., for the quantity 83 cm, without an arrangement. Graphic representations with a mathematical structure can be understood as signs with a relational character, whose perceptible basis is an inscription. Therefore, they take on the character of diagrams (Dörfler, 2008). There is mathematical matching between a word problem and a graphic representation if both are “informationally equivalent” (Palmer, 1978). This is the case if both of the following conditions are satisfied: Firstly, there is a match regarding the objects. This means there is a match between the quantities that can be split into measured value (e.g., 83) and measuring unit (e.g., cm) (Kirsch, 1997) on the text side, and the signs for the structurally relevant objects on the graphic side. Secondly, there is a match regarding the operations between the verbs and prepositions on the text side and the arrangement of the signs for the structurally relevant objects on the graphic side. In a, b, d, and f of Fig. 1, there is a complete matching regarding the measured values. In these examples, all measured values given in the task (15 cm, 83 cm) can be identified in the graphic representation. In c and e, this matching is partial, since only the 15 cm (c) or the 83 cm (e) is visible, while in b the, measuring unit is not considered; this is the case in the other examples. Regarding operations, there is a complete matching in a, b, and c, a partial matching in d, and no matching in e and f. According to Peschek (1988), the degree of abstraction can be characterized as being the degree of focusing on the representation of the word problem’s mathematical aspects. Two indicators are identified: a focus on the structurally relevant objects (indicator 1) and a focus on the mathematically relevant qualities of the structurally relevant objects (indicator 2). In a, b, c, d, and e of Fig. 1, indicator 1 can be considered high since no other objects are drawn. In f, it is low, because a fence is also drawn. Indicator 2 can be considered high in b and low in the other examples, because detailed trees are drawn. The key features can be analyzed independently of each other. However, the prerequisite for determining the mathematical matching and the degree of abstraction is that structurally relevant objects can be identified in the graphic representation.

2.2 Types of graphic representations for word problems

In the literature, different types of graphic representations are distinguished. A distinction is often made between schematic and pictorial representations (Hegarty & Kozhevnikov, 1999; Presmeg, 1986): In schematic representations, the focus is on the spatial relations described in a problem; in pictorial representations, the focus is on the visual appearance of the objects described in a problem (Hegarty & Kozhevnikov 1999). Rellensmann, Schukajlow, and Leopold (2017) make a similar distinction regarding the process of mathematical modeling by distinguishing between mathematical drawings and situational drawings: In mathematical drawings, the focus is on the mathematical model described in the problem; in a situational drawing, the situation described in the problem is pictorially depicted.

If the learners are responsible for both the process of generation of the graphic representation and the final product, we speak of learner-generated graphic representations (Van Meter & Garner, 2005). With regard to learner-generated graphic representations, a study by Sherin (2000) indicates that these graphic representations can differ in even more diverse ways. In a preliminary study (Ott, 2016, 2017), the following learner-generated graphic representations for word problems can be distinguished at the primary school level:

  • Off the text: there is no link to the text with regard to the content.

  • Illustrative: there is a link to the text, but no structurally relevant objects are represented.

  • Object-related: there is a link to the text and structurally relevant objects are represented although relations between them are not identifiable in the arrangement.

  • Diagrammatic: there is a link to the text, structurally relevant objects are represented, and relations between them are identifiable in the arrangement.

Figure 1 shows four examples of graphic representations that are diagrammatic (a, b, c, d) and two examples that are object-related (e, f). Figure 2 shows a graphic representation for the word problem that is off the text (a) and illustrative (b).

Fig. 2
figure 2

Graphic representations for the given word problem that are off the text (a) and illustrative (b)

Graphic representations that represent parts of the mathematical structure (object-related, diagrammatic) can differ in their degree of mathematical matching and the degree of abstraction (Ott, 2016, 2017). The mathematical matching may be complete, partial, or nonexistent in terms of measured values, measuring units, or operations. The degree of abstraction can be either high or low with regard to the two indicators.

2.3 Teaching graphic representations for word problems

With regard to studies on generating graphic representations for word problems, two types can be distinguished (Fagnant & Vlassis, 2013): encouraging students to use specified diagram types (e.g., Diezmann, 2002) and encouraging students to generate their own graphic representations (e.g., Van Dijk, Van Oers, & Terwel, 2003a). The results are inconsistent in terms of positive benefits. Some studies show that children often find it difficult to use predefined representations (Fagnant & Vlassis, 2013; Pantziara, Gagatsis, & Elia, 2009). Pantziara et al. (2009) suggest that the diagrams did not fit the learners’ personal preferences and mental models and conclude that the interpretation of diagrams is also essential for improving diagram competence. They report that students often tried to transform the given diagrams into pictorial representations to interpret them. Van Dijk et al. (2003a) and Van Dijk, Van Oers, Terwel, and Van den Eeden (2003b) compared the two approaches in the fifth grade and found better results when learners generated their own graphic representations. They conclude “that designing models in co-construction may lead to a deeper insight into the meaning and use of models and consequently make possible a more flexible approach in problem solving” (Van Dijk et al., 2003b). What is so far unclear is the effect of teaching that combines the generation of one’s own graphic representation with the interpretation of given graphic representations.

2.4 Learner-generated graphic representations and problem solving

Learner-generated graphic representations can fulfill two functions (Selter, 1993): as instruments, they function as an aid for problem solving in the sense of a private representation; as documents, they record the results and the solutions. In the latter function, they are public and related to an addressee. Such graphic representations are more complete, more richly inscribed, and more conventional than private representations (Cox, 1999).

The type of graphic representation employed by the student seems to have an influence on the success of problem solving: While there is a positive connection between schematic representation and successful problem solving, the generation of pictorial representations is negatively connected with problem solving success (Hegarty & Kozhevnikov, 1999). These findings are relativized by results regarding modeling tasks: According to Rellensmann et al. (2017), both situational and mathematical drawings are related to modeling performance but in different ways. While the accuracy of mathematical drawings is directly related to modeling performance, the accuracy of the situational drawing is indirectly related, mediated by the accuracy of mathematical matching.

Learners hardly use sketches for problem solving (Fagnant & Vlassis, 2013; Lopez Real & Veloo, 1993; Van Essen & Hamaker, 1990). Lopez Real and Veloo (1993) report that the request to draw a sketch improves solution rates. In most studies, however, this improvement does not occur (Hembree, 1992). Hembree’s meta-analysis showed that the use of graphic representations in problem solving can be trained. Comparing different instructional methods, training in drawing diagrams offered the largest improvement in problem solving. However, these positive effects do not yet occur during primary school age (Hembree, 1992). Accordingly, intervention studies by Van Essen and Hamaker (1990) showed that fifth graders could benefit from training in generating graphic representations for problem solving, while first and second graders could not. Similarly, Hembree (1992) concludes “that earlier grades should focus on problem representation instead of stressing solutions” (p. 269). Studies show that primary school children often find it difficult to represent mathematical relationships, preferring instead to illustrate the content of the task (Hasemann, 2006; Ott, 2016). When learners generate graphic representations that contain a great deal of mathematically irrelevant information, this can be problematic because it can make it difficult for them to recognize mathematical structures (Presmeg, 1986). To improve children’s representation skills, an as-yet unexplored approach can be seen in generating graphic representations, not as instruments for problem solving but as documents for later reflection processes in class.

3 Research questions

The following research questions will be examined:

Does an intervention in the 3rd grade based on reflective discussions about children’s graphic representations for word problems

  1. 1.

    Have a positive effect on the attention paid to the key features of graphic representations (mathematical structure, mathematical matching, degree of abstraction) in the learner-generated graphic representations?

  2. 2.

    Have a positive effect on the solution rates?

The questions focus on measurable changes based on the children’s documents. Studies show that generating graphic representations in co-construction leads to learners’ representations becoming less realistic and more focused on mathematical relationships (Van Dijk et al., 2003a). With regard to problem solving, studies show positive effects of interventions that encourage reflection on different approaches in problem solving (Sturm, 2018). Since the intervention in the study presented here combines the generation of one’s own graphic representation with the reflections on graphic representations, it is deduced that the intervention evaluated here has a positive effect on the attention paid to the key features of graphic representations by the 3rd graders. It is therefore assumed that the students generate mathematically correct, abstract graphic representations that are appropriate to the problem. Since there is a positive connection between schematic representation and successful problem solving (Hegarty & Kozhevnikov, 1999), it is deduced that the intervention evaluated here has a positive effect on the solution rates.

The following hypotheses will be tested:

  1. H1:

    After the intervention, children in the intervention group pay more attention to the key features of graphic representations, i.e.,

  1. (a)

    The mathematical structure

  2. (b)

    The mathematical matching

  3. (c)

    A high degree of abstraction

in their documents than children in the control groups.

  1. H2:

    Children in the intervention group solve word problems more often correctly after the intervention than children in the control groups.

4 Method

4.1 Design

The study is designed with three groups (intervention group and two control groups) and three waves of measurement (pretest, posttest, follow-up test). A total of nine intervention units were carried out. The teaching in the intervention group was carried out by the author. In the control groups, the usual mathematics lessons were continued by the class teachers. The teachers in control group 1 were additionally given the word problems from the intervention to use them in their lessons. They received no further training.

4.2 Participants

The study was conducted in the 3rd grade. Two classes from each of three primary schools took part and formed one of the three groups. All schools are from a suburban location. The participants from the intervention group comprised 35 children (18 boys, 17 girls). The average age at the first wave of measurement was 8 years and 4 months (youngest child: 7 years and 5 months; oldest child: 9 years). Control group 1 consisted of a total of 43 children (21 boys, 22 girls). The average age was 8 years and 4 months (youngest child: 7 years and 8 months; oldest child: 9 years and 10 months). Control group 2 consisted of a total of 46 children (11 male and 35 female). The average age was 8 years and 4 months (youngest child: 7 years and 6 months; oldest child: 10 years and 2 months). A total of 33 children in each group took part during all six test days.

4.3 Items

Both the intervention items and the test items were developed as word problems based on schoolbook tasks according to the curriculum. Main criterion for the development was the extent to which the formulation of the text suggests the graphic representation of the mathematical structure. Three types can be distinguished (see Table 1): Word problems of type A are characterized by the fact that the structurally relevant objects can be directly drawn, and their arrangement is described in the text. In contrast, in word problems of type C, the structurally relevant objects are not directly drawable by their physical properties. For a graphic representation, signs for these objects and their arrangement must be invented. For type B tasks, some but not all structurally relevant objects can be drawn directly.

Table 1 Examples for three types of word problems

Another criterion was to develop challenging tasks for the learners through the arithmetic or semantic structures of the tasks. For example, tasks for comparison situations or linear equations were developed. An overview of all items used in the intervention and the paper pencil test can be found in Table 6 in the Appendix.

4.4 Intervention

In the intervention, reflections (Freudenthal, 1991) should enable learners to develop their competences in graphic representation. The intervention was divided into two phases, each taking place on a weekly basis (Ott, 2018). In phase 1, both classes of the intervention group received a letter from the author informing them of the word problem of the week. Each student was asked to generate a graphic representation for this given word problem containing everything that is important for him or her to understand and solve the word problem. In order for the students to make their graphic representations as complete as possible (Cox, 1999), the instruction was to generate a graphic representation that is understandable for others. In case the students were able to solve the problem, they were also required to note the solution. The students generated their graphic representations on their own without further assistance during the free-work periods, which took place every morning in the classes. Each student dropped the document with his or her own graphic representation into a class mailbox, which was emptied by the author at the end of the week. From these learner-generated graphic representations, a maximum of three graphic representations were selected by the author for phase 2 of the intervention. The selection was made so that the examples differed as much as possible in the way the drawing was done. Depending on which graphic representations were to be found in the children’s documents, an attempt was made to select examples with different mathematical structure, mathematical matching, or degree of abstraction.

In phase 2, these selected graphic representations formed the basis of the reflection discussions with the whole class. Enlarged copies of the selected children’s documents were attached one after the other to the blackboard. The whole class sat down in front of the blackboard. The aim of the reflection was to collectively explain the shown graphic representation and thus to try to understand the point of view of the author of the graphic representation. The author of the graphic representation was allowed, if he or she so wished, to comment on his or her graphic representation at the end of the reflection. Otherwise, the representation remained anonymous. Since the interpretation of the graphic representations is a challenge for the children, each graphic representation was first examined and analyzed individually. The reflection process was supported and stimulated by the author, in order to encourage a change in perspective (Freudenthal, 1991; Schülke, 2013): “What was the child probably thinking while making the math drawing? … What do you particularly like about the math drawing? Why? … What do you suppose: Why did the child draw the things in the math drawing, the relationships between things, and the labels like this?” In the course of reflection, the students also sometimes wished to improve on the graphic representations and discussed different possibilities. This was the case, for example, if the graphic representations were mathematically incomplete. Once all the graphic representations on the blackboard had been individually reflected in this way one after the other, the children were asked to compare them with each other and work out similarities and differences. The comparison in turn could relate to the way the drawing was done, how the mathematical structure was depicted, the mathematical matching or the degree of abstraction. The comparison also promotes a change in perspective and hence initiates a reflection process (Freudenthal, 1991, Schülke, 2013). The children also discussed why one or the other graphic representation fits the given word problem particularly well. In the discussions, they referred to both mathematical and content aspects of the word problem. Furthermore, they discussed the possibility to see the solution of the word problem in the graphic representation as well as the importance of the content-related references in the graphic representations. Some children preferred more abstract graphic representations and focused on the mathematical aspects. Others preferred more concrete graphic representations as the content of the word problem was important to them.

4.5 Paper and pencil test

The same paper and pencil test was performed at each test time. The test consisted of eight word problems. Three of them are shown in Table 1. The mathematical structures of six test items are similar to those of the intervention items. Two items show another mathematical structure. Four test items are formulated similarly to intervention items and differ only in terms of content, and the other four differ in both respects (see Table 6 in the Appendix). The instruction was the same as in the intervention. Testing took place on two successive days, with four test items each day. The tests were carried out by trained test leaders and the author.

4.6 Analysis

In a preliminary study (Ott, 2016, 2017), an analysis tool for graphic representations for word problems was developed. The analysis tool makes it possible to clearly assign each learner-generated graphic representation to a category according to mathematical structure, mathematical matching, and the degree of abstraction (see Section 2.3). The good interrater reliability of K = 0.81 (mathematical structure), K = 0.99 (mathematical matching with regard to the measured values), K = 0.96 (mathematical matching with regard to the measuring units), K = 0.99 (mathematical matching with regard to the operations), and K = 0.90 (degree of abstraction) allows this analysis tool to be used in the study presented here. The objectivity of the evaluation is guaranteed by the standardized procedure specified in an analysis guideline (Ott, 2016, 2017).

5 Results

The 2780 documents produced by the children were encoded by two raters. The interrater reliabilities at the three test times varied between K = 0.95 and K = 0.98 and are thus to be regarded as very good. Three graphic representations in Fig. 1 show the development of the graphic representations of a particular student from pretest (e) via posttest (d) to follow-up test (a). Different qualitative developments can be seen in Ott (2016, 2020). The focus here is on the quantitative part of the study. In order to stay conservative, all statistical hypotheses were tested against an alpha level of 5% (two-tailed). Therefore, the reduced sample (N = 33 in each group) is used. The criterion of a sample size > 30 in each group, as a prerequisite for performing a two-way analysis of variance (ANOVA), is thus fulfilled (Bortz & Schuster, 2010). A factorial ANOVA was conducted for each of the three key features of graphic representations and for the solution rates to compare the main effects of time of testing and group and the interaction effect between time of testing and group. The characteristic values of interest are, if present, coded with 1, or 0 if they are not present; total values are formed for all eight items in a test time. Table 2 shows the main and interaction effects.

Table 2 Main effects of time of testing (TZP) and group (G) and interaction effects (TZP × G)

5.1 Mathematical structure

In a first step, the representation of the signs for structurally relevant objects and their arrangement is tested. In a second step, only the representation of the relationships is examined more specifically. Table 3 shows the development of the mean values of the groups in comparison. Since the Mauchly test proves that sphericity is violated, the Greenhouse-Geisser correction was utilized.

Table 3 Mean values and standard deviations with regard to the mathematical structure (N = 33 in each group)

5.1.1 Representation of structurally relevant objects and their relationships

The main effects of time of testing and group are statistically significant at the .05 significance level (see Table 2). The interaction effect is significant (p < .001) too. The effect size of the interaction (ηp2 = .195) is classified as strong (Cohen, 1988). The pairwise comparisons with the Bonferroni correction show no significant differences (ps > .05) at the pretest time. At the posttest time, there were significant differences between the intervention group and control group 1 (p < .001) and between the intervention group and control group 2 (p = .002). Significant differences between the intervention group and control group 1 (p < .001), as well as between the intervention group and control group 2 (p = .007), are also observed in the follow-up test. The remaining pair comparisons show no significant differences (ps > .05) (see Fig. 3a).

Fig. 3
figure 3

Group interaction with regard to the structurally relevant objects and their relationships (a) and with regard to the relationships only (b) (N = 33 each group; **p < .01, ***p < .001)

Conclusively, hypothesis H1(a) can be confirmed. After the intervention, the intervention group more frequently represents structurally relevant objects and relationships in the documents than the control groups.

5.1.2 Representation of relationships only

Both main effects are statistically significant at the .05 significance level too (see Table 2). The interaction effect is also significant (p < .001). The effect size of the interaction (ηp2 = .237) is classified as strong (Cohen, 1988). At the pretest time, pairwise comparisons with the Bonferroni correction reveal no significant differences (ps > .05). At the posttest time, there are significant differences between the intervention group and control group 1 (p = .001) and between the intervention group and control group 2 (p < .001). Significant differences between the intervention group and control group 1 (p < .001), as well as the intervention group and control group 2 (p = .002), are also found in the follow-up test. At the follow-up test time, a significant difference occurs between control group 2 and control group 1 (p = .001). The remaining pair comparisons show no significant differences (ps > .05) (see Fig. 3b). Hypothesis H1(a) can also be confirmed for the specification on relationships. After the intervention, the intervention group more frequently represents relationships in the documents than the control groups.

5.2 Mathematical matching

The mathematical matching is tested separately for measured values, measuring units, and operations. Table 4 shows the development of the mean values of the groups in comparison. The Mauchly test proves that sphericity is fulfilled.

Table 4 Mean values and standard deviations with regard to the mathematical matching (N = 33 in each group)

5.2.1 Measured values

The main effect of time of testing is significant (p < .001) and the main effect of group is not significant (p = .093) (see Table 2). The interaction effect is significant (p < .001). The effect size of the interaction (ηp2 = .204) is to be classified as strong (Cohen, 1988). The pairwise comparisons with the Bonferroni correction do not show significant differences (ps > .05) at the pretest and posttest time. The follow-up test shows a significant difference between the intervention group and control group 1 (p < .001) and between the intervention group and control group 2 (p = .009). There is no significant difference between the two control groups (p > .05) (see Fig. 4a).

Fig. 4
figure 4

Group interaction with regard to the appropriate measured values (a), the appropriate measuring units (b) and the appropriate operations (c) (N = 33 each group; *p < .05, **p < .01, ***p < .001)

Hypothesis H1(b) can be confirmed for the follow-up test time. Three months after the intervention, the intervention group more frequently observes the mathematical matching of the measured values in the documents than the control groups.

5.2.2 Measuring units

Both main effects are statistically significant at the .05 significance level (see Table 2). The interaction effect is significant (p < .001) too. The effect size of the interaction (ηp2 = .337) is to be classified as strong (Cohen, 1988). The pairwise comparisons with the Bonferroni correction reveal a significant difference between the intervention group and control group 1 at the pretest time (p = .006). The documents of the children in the intervention group have the lowest values for the matching of measuring units. There is also a significant difference between the intervention group and control group 1 at the posttest time (p = .049). The disadvantage for the intervention group observed during the pretest turns into an advantage at the posttest time. The follow-up test reveals significant differences between the intervention group and control group 1 (p < .001) and between the intervention group and control group 2 (p < .001). The remaining pair comparisons show no significant differences (ps > .05) (see Fig. 4b). Hypothesis H1(b) can be confirmed for the follow-up test time. Three months after the intervention, the intervention group more frequently observes the mathematical matching of the measuring units in the documents than the control groups.

5.2.3 Operations

Regarding the operations, both main effects are statistically significant at the .05 significance level (see Table 2). The interaction effect is significant (p < .001) too. The effect size of the interaction (ηp2 = .233) is classified as strong (Cohen, 1988). The pairwise comparisons with the Bonferroni correction reveal significant differences for the posttest time between the intervention group and both control groups (ps < .001). There are also significant differences between the intervention group and control group 1 (p < .001) and between the intervention group and control group 2 (p = .001) at the follow-up test time. The remaining pair comparisons show no significant differences (ps > .05) (see Fig. 4c). Hypothesis H1(b) can be confirmed. After the intervention, the intervention group more frequently observes the mathematical matching of the operations in the documents than the control groups.

5.3 Degree of abstraction

The two indicators of the degree of abstraction are tested individually. Table 5 shows the development of the means of the groups in comparison. Since the Mauchly test proves that sphericity is violated, the Greenhouse-Geisser correction was utilized.

Table 5 Mean values and standard deviations with regard to the degree of abstraction (N = 33 in each group)

5.3.1 Indicator 1

The main effect of testing time is significant (p < .001) and the main effect of group is not significant (p = .157). The interaction effect is significant (p = .023) (see Table 2). The effect size of the interaction (ηp2 = .060) is to be classified as moderate (Cohen, 1988). The pairwise comparisons with the Bonferroni correction do not show significant differences (ps > .05) at the pretest and posttest times. The follow-up test time shows a significant difference between the intervention group and control group 1 (p = .036). The remaining pair comparisons at the follow-up point show no significant differences (ps > .05). Hypothesis H1(c) has to be partially rejected for indicator 1. After the intervention, the intervention group does not more frequently focus on the structurally objects in the graphic representations compared with the control groups, with the exception of control group 1 at the follow-up test.

5.3.2 Indicator 2

All effects are not statistically significant at the 0.5 significance level, except for the main effect of testing time (see Table 2). Hypothesis H1(c) has to be rejected for indicator 2. After the intervention, the intervention group does not more frequently focus on the mathematically relevant qualities of the structurally relevant objects compared with the control groups. Descriptive results suggest that it is the items that are decisive for the degree of abstraction, but not the mathematical structure (Ott, 2016).

5.4 Solution rates

The Mauchly test proves that sphericity is fulfilled. All effects are not statistically significant at the .05 significance level, except for the main effect of testing time (see Table 2). Hypothesis H2 has to be rejected. The intervention group does not correctly solve word problems more often after the intervention than the control groups.

Descriptive results (Ott, 2016) show that in the intervention group, the continuous increase in correct solutions is accompanied by a decrease in the proportion of false solutions and in the proportion of documents in which no solution is specified. This picture does not appear in the control groups.

6 Discussion

6.1 Summary of the results

The results show a differentiated picture with regard to the key features of graphic representations. Since the intervention group did not experience any other support in problem solving or graphic representation than the intervention itself during the period of the study, the results likely suggest that the findings are a result of the intervention.

6.1.1 Mathematical structure and mathematical matching

At the time of the pretest, the mean values for the representation of mathematical relationships in all three groups are low. This finding confirms earlier observations (Hasemann, 2006; Ott, 2016). After the intervention, the intervention group more often generates object-related and, above all, more diagrammatic graphic representations. The findings on mathematical matching with regard to the operations show that the intervention group after the intervention not only more frequently represents mathematical relationships to a noteworthy extent than the control groups but also pays significantly more attention to mathematical matching concerning the operations given in the word problems. The learners also pay more attention to the depiction of the given measured values and measuring units. Here, too, they differ significantly from the control groups. Since the interaction effect is significant, these developments can be attributed to the intervention. The word problems alone cannot be regarded as decisive for the development, due to the significant difference to control group 1. The level of the interaction effect shows that the differences between the groups are also practically significant. Since the development does not stop after the intervention, it can be regarded as sustainable.

These findings are in line with those of Van Essen and Hamaker (1990), who report that learner-generated drawings are richer and more oriented towards mathematical relationships after a short period of training. In the study presented here, the richness is reflected in an increased representation of the measured values and measuring units. These results also complement findings by Van Dijk et al. (2003b): a combination of the self-generation of graphic representations and processes of reflection on them in class may promote the graphic representation of mathematical structures with mathematical matching to the given word problem.

6.1.2 Degree of abstraction

Regarding the degree of abstraction, the findings do not meet expectations. The intervention group does not pay attention to a higher level of abstraction after the intervention than the control groups. A low degree of abstraction is a typical way of drawing among children (Sherin, 2000). However, the result is in contrast with the findings of other studies (Lopez Real & Veloo, 1993; Van Dijk et al., 2003a; Van Essen & Hamaker, 1990) in which the graphic representations become more formalized and schematic. The results must be interpreted against the background of the intervention: the children were asked to record everything of importance to them for solving the word problem. The graphic representation should be understandable for someone else. When representations are made for others, they tend to be richer (Cox, 1999). The children seem to consider realistic drawings to be more understandable for others. This complements the findings of Pantziara et al. (2009) who found that learners tried to transform schematic representations into pictorial representations to interpret them. No statement can be made concerning the degree of abstraction if the graphic representations had been created as heuristics.

However, with a relatively constant degree of abstraction, the children in the intervention group paid more attention to the mathematical structure and matching after the intervention. It can be interpreted in such a way that children do not neglect the content of a word problem but establish more flexible mathematical relationships. The results suggest that the development of the degree of abstraction is independent of the representation of the mathematical structure and matching. This should be examined in further investigations. Nevertheless, it must be taken into account that mathematically irrelevant information can make it difficult to recognize mathematical structures and therefore poses an obstacle for problem solving (Presmeg, 1986). This could provide an explanation of the results with regard to the solution rates.

6.1.3 Solution rates

The intervention group does not correctly solve the word problems significantly more often than the control groups. This is in contrast with the results of Van Dijk et al. (2003a), in whose study designing models using a co-construction approach had a positive effect. However, in the study presented here, the graphic representations were generated as documents for others and not as instruments. In addition, in the test processing and in the intervention, the focus was not on problem solving. This study shows that generating a mathematically matching diagrammatic representation does not guarantee that the problem will be solved correctly. This is in line with results of Van Essen and Hamaker (1990) and could be interpreted in that the children regard the graphic representation and the task solution as independent from each other. One reason for this can be seen in the instruction, which does not use graphic representations as a heuristic. A further cause can be seen in the intervention, in which the focus was not on the connection between graphic representation and correct solutions. Further research with a modified instruction, which includes problem solving to a greater extent, is needed. According to Hembree (1992), another reason could be the children’s age: positive effects on problem solving through interventions in graphic representation do not yet occur at primary school age.

6.2 Limitations

No study comes without limitations. First, the sample of the study is not globally representative, but can be regarded as characteristically representative with regard to graphic representations in mathematics lessons (Bortz & Döring, 2006). Second, for the development of representation skills, the time frame set for practical reasons with nine intervention units is relatively short. The multi-group design makes it possible to attribute effects to the intervention and to control for possible disturbance variables, such as age-related development, habituation to the test, the influence of the intervention tasks themselves, and class composition. Nevertheless, the possibility of interference cannot be avoided in a study under almost real conditions. The development of control group 1 could be influenced by such effects. To minimize external influences, the intervention group was taught by the author. It is known that studies of this kind show slightly higher effects (De Boer, Donker, & Van der Werf, 2014). In the future, an independent replication should show whether the same effects occur when teachers carry out the intervention themselves. Here, a follow-up test was carried out to analyze long-term effects. In addition, it would make sense to examine the children’s class discussions more closely for gaining a better understanding of the development processes. The qualitative part of the study presented here provides some information in this regard (Ott, 2016, 2020). Furthermore, in the results, ceiling effects are partly observable, as also expressed in the change to the scatter over time. This is due to the fact that the test does not differentiate any further at the upper end. Finally, no statement can be made as to the extent to which the children use graphic representations as heuristics for problem solving.

7 Conclusion

In the intervention presented here, learners of the 3rd grade first generated their own graphic representations for a given word problem as documents for others and then reflected on some of these learner-generated graphic representations in class. The aim of the reflection was to collectively explain the shown graphic representation and thus to try to understand the point of view of the author of the graphic representation. The results suggest that learners are enabled by this intervention to more frequently pay attention to a mathematically appropriate structure in their graphic representations for word problems. The results show that this effect is also sustainable. Even if the solution rates did not improve to a statistically significant extent, the basis for the use of graphic representations as an instrument for problem solving can still be laid. All in all, teaching that combines the self-generation of graphic representations and reflection processes on them in class seems to be positive for the development of graphic representation competences.