Introduction

This study investigated the relationship of grade and ability of text-picture integration in multimedia learning. Children from grade 5 (around 11 years old) and grade 8 (around 14 years old) in secondary schools were recruited. As they are in early adolescence to develop a sense of self-esteem, competencies, and individuality (Eccles, 1999), they are appropriate samples to compare the grade differences in multimedia reading (e.g., McElvany et al., 2008; McLeod et al., 2019; Retelsdorf et al., 2011). According to previous studies (Sweller, 1988; Zhao et al., 2020a), an incorrect response to a question can indicate high cognitive load and increased demands in attentional resources. With the increase of grade, learners can better regulate their learning behavior when they encounter difficulty (Zimmerman & Martinez-Pons, 1990). It is thus probable that learners from different grades behave differently in terms of premature termination, prior to correct vs. incorrect responses to a question. Accordingly, we compared participants from grade 5 vs. 8 prior to correct vs. incorrect responses. Completion time and survival analysis (Wright, 2000) were used to explore which group of participants has the tendency of premature termination. The fixation patterns on texts and pictures were used to give hints to the reasons of premature termination.

Multimedia learning

Multimedia learning is defined as learning from texts and pictures. Texts can be presented in the written or auditory form. Pictures (such as data graphs, realistic pictures, maps) can be presented in the static or dynamic form. Texts and pictures differ in terms of two main facets: the principle of presentation and the kind of processing (Schnotz, 2014). In terms of presentation, texts consist of arbitrary symbols, and pictures consist of icons that are less arbitrary. The representational function of texts is the description of the subject matter, whereas the representational function of pictures is the depiction of the subject matter. For instance, the word “dog” describes a carnivorous mammal that has four legs and accurate odor perception and is often kept by people as a pet to hunt or guard things (Cambridge Dictionary, n.d.). A picture of a dog shows only one kind of a dog, such as a picture of a husky. In terms of processing, texts have hierarchical structures governed by grammatical rules and semantical sense. For instance, “Three turtles rested beside a floating log, and a fish swam beneath them.” “Three turtles rested on a floating log, and a fish swam beneath them” (cf. Bransford et al., 1972). The two sentences have the same grammatical structure but do not lead to the same mental model. Texts are usually comprehended in a linear way, such as word by word. In contrast, pictures have a higher level of degree of freedom to be processed. They can be easily accessed from any direction. Therefore, texts suit the cognitive function of conceptual guidance and construction of a mental model. Pictures suit the cognitive function of the external support of the adaptive construction of the internal model of the subject matter (Schnotz, 2014; Schnotz et al., 2014).

The integrative processing of texts and pictures is proposed to take place in a verbal channel as well as in a pictorial channel, according to the theory of dual coding (Clark & Paivio, 1991; Paivio, 19862006), the cognitive theory of multimedia learning (Mayer, 2001, 2005, 2011), and the integrative model of text-picture integration (Schnotz & Bannert, 2003). The verbal processing in the verbal channel requires constructing the propositional representation of the text by making references of the objects and events being described in the text using relevant world knowledge. The pictorial processing in the pictorial channel requires constructing the mental representations by structure mapping based on analogies between the external and internal depictive representations. In multimedia learning, verbal processing and pictorial processing interact continuously with each other regarding mental model construction and adaptation (Zhao et al., 2014, Zhao et al., 2020b). Thus, on the one hand, instructors should appropriately use texts and pictures based on how our mind works (Carney & Levin, 2002; Levie & Lentz, 1982; Levin & Mayer, 1993; multimedia principle, Mayer, 2005). On the other hand, learners should have multimedia literacy to be able to encode and interpret the pictorial information. They should be able to negotiate and create meaning from information given in the pictorial form (Van Leeuwen & Kress, 1990). Children from grade 5 to grade 8 are gradually developing the skill of text-picture integration. Previous studies (Hochpöchler et al., 2013; Schnotz & Wagner, 2018) have suggested that grade is related to the ability of text-picture integration. In multimedia learning, 5th graders are slower than 8th graders in text reading, and 5th graders pay less attention to pictures than 8th graders do. In the current study, samples of children from grades 5 and 8 were recruited to investigate the relationship between grade and multimedia learning.

Cognitive theories of multimedia learning only suggest the separate processing of texts and pictures in the human mind. However, they do not point out the unbalanced usage of texts and pictures in different cognitive processing (i.e., the mental model construction and adaptive mental model specification). Due to the distinctive cognitive functions of texts and pictures, the attention can be shifted differently to texts or pictures in mental model construction and in adaptive mental model specification (Schnotz et al., 2014; Schnotz & Wagner, 2018). Positioning the question before vs. after the material is one possible way to trigger the processing of mental model construction vs. adaptive mental model specification (cf. Rothkopf, 1966; Zhao et al., 2014; Zhao et al., 2020b). When the question is positioned after the material, participants have already experienced the materials and constructed the mental model (almost) completely. When the question is displayed, participants are in the processing of adaptive mental model specification to update and adapt the mental model to the task demands. In this adaptive phase, the cognitive processing should be shifted more towards pictures compared to texts, as pictures can be more easily used as an external representation for adapting the mental model (cf. Lindner et al., 2017; Schnotz & Wagner, 2018; Zhao et al., 2020b). When the question is placed before the material, the processing can be selective yet participants have to first understand the material. It thus triggers first the initial mental model construction and then the adaptive mental model specification. Accordingly, the cognitive processing should be shifted more towards texts at the beginning and gradually towards pictures at the end, as texts guide the initial mental model construction conceptually and pictures support the mental model adaptation (Schnotz & Wagner, 2018; Zhao et al., 2020b).

Tendency to give up prematurely

In the context of learning, premature termination is a tendency to give up before the knowledge is acquired or the problem is solved. This can be rooted in a lack of metacognitive skills. Metacognitive skills, such as monitoring and controlling the cognitive process, are useful for balancing between text and pictures (Riemer & Schrader, 2019). Children who have higher metacognitive skills are more likely to be independent learners. These learners have the confidence to discover the information based on their own needs (Djudin, 2017; van Kraayenoord & Paris, 1996). Furthermore, helpless feelings may play a role for premature termination. Previous studies have shown that younger children are more vulnerable to feel helpless than older children when encountering difficulties (Borkowski et al., 1990; Chan, 1994; Paris & Winograd, 1990).

Speed-accuracy trade-off can occur during multimedia learning. A previous study of Kail (1985) tested participants at age 11, 14, and 19 with two sets of stimuli either in the upright form or in the rotated form. Participants judged whether the stimuli would be identical or mirrored. They were instructed to emphasize either accuracy, fast response, or fast and accurate response. The speed-accuracy trade-off was more pronounced to younger participants (11-year-olds) than to older participants (14- and 19-year-olds). Previous studies have shown that individuals from lower grades (e.g., 5th graders) are not capable yet to process the presented information fluently compared to individuals from higher grades (e.g., 8th graders) (Hochpöchler et al., 2013; Schnotz & Wagner, 2018). Potentially, this results in a larger speed-accuracy trade-off for 5th graders. When 5th graders complete the multimedia tasks quickly, it is very likely that they will make errors.

Survival analysis

One of the potential methods to capture the premature termination is the survival analysis (Wright, 2000). Survival analysis is a modeling of occurrence of events over time. It estimates the probability of the occurrence of an event that has not yet happened. It censors the proportion of participants who have not experienced the event of interest. Survival analysis captures the probability of the occurrence of a one-time-change for a given amount of time passed. It has, for instance, been used in medical or psychotherapy treatment (Corning & Malofeeva, 2004) and in visual search task analysis (Reingold & Glaholt, 2014).

Recently, survival analysis has been used in the educational context. Hlioui et al. (2020) have examined the student behavior in a massive open online course by measuring the time at which students drop out of the course. Bacca-Acosta and Avila-Garzon (2021) have investigated the proportion of students who stopped using the mobile application on second language learning. Investigating problem-solving, Blech et al., (2019) have used the survival analysis to detect the task completion rate in the course of processing by looking at the completion rate of participants per completion-time unit. Survival analysis allowed to document that with vs. without think-aloud demands a similar percentage of participants to complete the problem within 30 s, 60 s, 90 s, etc.

Here this study employs survival analysis in order to track in more detail where differences among students from different grades confronted with multimedia material exactly originate from. Prior work (Schnotz & Wagner, 2018; Zhao et al., 2020b) has documented average group differences in rate of correct responses to items as well as in time spent on text and picture materials. Survival analysis can help to explore whether group differences in outcome in part originate from less balanced allocation of attention to text vs. picture material over the time course of dealing with a unit and of giving up processing the unit in favor of a premature attempt to answer the item. In the current study, the event of interest was the time at which participants completed the multimedia task. The longest reading time was used for a specific unit (i.e., Savannah) and was divided into 10 s. Each quintile examined how many percentages of participants have not yet completed the task.

Research questions

This study compares the relationship between grade (5 vs. 8) and multimedia learning, prior to correct vs. incorrect responses. The analysis of the time allocated to the learning material overall and specifically to text and picture is analyzed separately for units that will end with an incorrect vs. a correct answer. The former suggests that the learner did not generate an appropriate mental model. Extra time and/or a different weighing of fixation on text and picture might have yielded a better model and outcome. Such additional cognitive processing is speculated to be more prevalent among 8th graders than among 5th graders. Eighth graders are more capable of decoding written texts, retrieving world knowledge in the process of comprehension, using multimedia materials and spend more time on searching for the relevant information, and updating the mental model. In contrast, 5th graders may have difficulties in processing multimedia materials. They may give up early looking for the relevant information. The correct answer, in turn, indicates that the learners have established the appropriate mental model. No or little extra effort is needed to answer the question. As 5th graders have lower verbal abilities, they may need additional cognitive processing compared to 8th graders. Accordingly, it is probable that there is an interaction between grade (low vs. high) and accuracy (incorrect vs. correct) on task completion. This leads to the following question and hypothesis.

  1. 1.

    Do learners from different grades differ in task completion time, prior to correct vs. incorrect responses to a question?

  • Hypothesis 1: The extent to which more time is spent prior to answering incorrectly compared to answering correctly is larger in 8th graders compared to 5th graders.

Moreover, it was assumed that students from different grades may have different fixation patterns on texts and on pictures. Displaying questions after the materials primarily triggers adaptive mental model specification. In adaptive mental model specification, pictures play a more important role than texts do (Zhao et al., 2020b). Positioning questions before the materials triggers both initial mental model construction and adaptive mental model specification. In initial mental model construction, texts play a more critical role than pictures do. Therefore, text processing should be dominant at first, and the picture processing should gradually increase in the course of multimedia processing. Students, who give up early and have incorrect answers, may not establish the mental model well and may have difficulties in adapting the mental model to the task demands. The inappropriate usage of texts and pictures should be more pronounced with learners from lower grades. Previous studies (Hochpöchler et al., 2013; Schnotz et al., 2014) have shown that learners from lower grades tend to focus less on texts in initial mental model construction and to focus less on pictures in adaptive mental model specification than their counterparts from higher grades. Therefore, the following question and hypotheses were proposed regarding the fixation patterns on texts and pictures.

  1. 2.

    Do lower grade students who give up early and answer incorrectly take a disadvantageous approach to the use of images and texts from the start and therefore fail to build up a model and then adapt it later?

  • Hypothesis 2a: When the question is positioned after the material, the fixations of pictures for (1) 5th graders who give up early and give incorrect answers will be less than (2) the ones who spend normal time and give incorrect answers and (3) the ones who spend normal time and give correct answers.

  • Hypothesis 2b: When the question is positioned before the material, the fixations of texts and pictures for (1) 5th graders who give up early and give incorrect answers will be less than (2) the ones who spend normal time and give incorrect answers and (3) the ones who spend normal time and give correct answers.

Furthermore, as our sample of 5th and 8th graders was recruited from non-academic track as well as academic track schools, we also explored whether this characteristic would be related to the prevalence of premature termination.

Method

The data were re-analyzed from a published article (Zhao et al., 2020b).

Participants

One hundred forty-four secondary school students participated in the experiment. Half were 5th graders (Mage = 11.4, SD = 0.6; 28 females, 44 males), and the other half were 8th graders (Mage = 14.5, SD = 0.7; 44 females, 28 males). Half of the 5th graders and 8th graders were from the non-academic track (non-AT), and the other half were from the academic track (AT). Fifth graders had an average verbal ability of 48.96 (SD = 5.98; AT: M = 51.06, SD = 5.88; non-AT: M = 46.86, SD = 5.37) and spatial ability of 50.13 (SD = 9.60; AT: M = 52.97, SD = 9.22; non-AT: M = 47.28, SD = 9.24) as measured by the German version of the Cognitive Abilities Test for the 5th graders (Heller & Perleth, 2000). Eighth graders had an average verbal ability of 49.44 (SD = 7.43; AT: M = 52.61, SD = 7.49; non-AT: M = 46.22, SD = 5.92) and spatial ability of 49.82 (SD = 7.65; AT: M = 51.64, SD = 7.24; non-AT: M = 48.00, SD = 7.72) as measured by the test for the 8th graders. All participants had normal or correct-to-normal visions.

Experimental design

The learning materials originated from authentic school textbooks in Geography and Biology for grades 5 to 8 in Germany (see Appendix). Six materials were randomly selected from a set of 40 units to test text-picture integration skills, and each unit had one question (Schnotz et al., 2010). They dealt with the structure of insect legs (67 words), the banana trade (91 words), the auditory ranges of animals and humans (130 words), the pregnancy (143 words), the map of Europe (136 words), and the types of savannahs (168 words). The picture types in the materials were realistic pictures, data graphs, diagrams, and maps. Previous research has revealed that the structure of visualizations affects the structure of the corresponding emerging mental models (e.g., Schnotz & Bannert, 2003). This effect was controlled in the experiment by systematic rotation of the different visualizations across the experimental conditions. As shown in Fig. 1, all displays included an area of interest (AOI) of text on the right side, an AOI of picture on the left side, and an AOI of question on the lower left side.

Fig. 1
figure 1

Example of a text-picture material on the topic Savannah (translated from German)

To increase the generalizability of our results, we varied the complexity of the questions based on Wainer’s (1992) taxonomy. The order of questions and the order of the materials were counterbalanced in a Latin square (see Table 1B in Appendix). All questions could only be answered correctly by combining information from the text and from the picture. Low-complexity questions required only element mappings between text and picture. For instance, to answer the question what is the amount of rainfall per year in Accra (see unit Savannah in Appendix), participants have to scan the picture; find the data graph for Accra, which is on the top left; find in the text about the meaning of the axes; and find the correct answer, “787 mm.” Medium-complexity questions required mappings of simple relations between the text and the picture. For instance, to answer the question which plant can grow well in Enugu, participants have to scan the picture, find that Enugu has the 1661 mm rainfall per year in the data graph, search the different amounts of rainfall per year for the plants, compare them, and find the correct answer “Yams.” Difficult-complexity questions require mappings of complex relations between the text and the picture. For example, to answer the question which plants can grow well in the most cities, regardless of the type of savannah (millet/ manioc/ cotton/ peanut), participants should first search for the relevant texts “rainfall (blue) and temperature (red).” Then they search for “787 mm in Accra,” “1661 mm in Enugu,” “887 mm in Ouagadougou,” and “549 mm in Sinder” in blue in the picture. For each option, they find in the text about the amount of rainfall per year for ideal growth: “millet: 180 to 700 mm,” “manioc: 500 to 2,000 mm,” “cotton: 700 to 1,500 mm,” and “peanut: 250 to 700 mm.” Then they compare the amount of rainfall for growing the crops to the rainfall per year in all cities. At last, they give the answer “manioc,” as it can grow well in all the cities.

To trigger the processing of mental model construction or adaptive mental model specification (Zhao et al., 2014; Zhao et al., 2020b), the position of questions was manipulated in two conditions: question-after-material condition vs. question-before-material condition. The two conditions were identical regarding the elements on the screen (i.e., the texts, pictures, and the question) but differed with respect to the prior engagement with the material and should therefore be characterized by distinct patterns of usage of text vs. picture. In the question-after-material condition (see Fig. 2), participants have been previously given the opportunity to process the materials without knowing the questions. When the question came, they adapted the constructed mental model to the task demands. This condition should mainly trigger the adaptive specification of the mental model that had been generated in the preceding phase when the question had not yet been known. In the question-before-material condition (see Fig. 2), participants have known the question from the start, but they lack knowledge about the material. The question-before-material condition hence triggered first the initial mental model construction and then the adaptive mental model specification in the course of processing.

Fig. 2
figure 2

Design overview of the study

Procedure

After obtaining informed consent from the parents, each child was tested individually in a lab environment with a Tobii XL60 24-inch eye tracker operating at 60 Hz. Participants first completed the verbal and spatial tests based on their age (Heller & Perleth, 2000). Then, they watched an instructional video about what is an eye tracker, how it works, and things to be aware of during calibration and recording. All participants were informed that they should read carefully both texts and pictures in order to understand the content. Reading was self-paced. Participants pressed the space key to turn pages and pressed arrow keys to give answers. They received no feedback on whether their answers were incorrect or correct during the experiment, and they could not turn pages backwards.

During the eye tracking experiment, participants received one warm-up reading material in the question-after-material condition and one in the question-before-material condition. These materials helped participants to practice how to press the corresponding keys to turn pages and answer questions. Afterwards, the real experiment started, and they should read six materials and answer one question per material. Half of the participants received the first three materials in the question-after-material condition and the last three materials in the question-before-material condition. For the others, the order was reversed. Before each material, they were informed about the topic of the material they would receive (e.g., Savannah). In the question-after-material condition (see Fig. 2), participants were instructed that they would receive a material without question first and they should understand the content as much as they could. When they finished reading, they would receive the material again with a question, and they should press the arrow buttons to give answers. In the question-before-material condition, participants were instructed that they would receive a topic of a material. They would see a question and should try to understand the question and should not answer it now. Afterwards, participants would receive a text-picture-blended material accompanied by the question that they have seen before, and they should give an answer.

Results

The average task completion time is first reported for incorrect vs. correct answers depending on grade and school track. Survival analyses help to locate where mean differences originate from. The fixation patterns on texts and pictures provide the information about text-picture integration. Note that analyses with gender did not reveal a significant effect (for more details, see Appendix).

Completion time

Question-after-material

We report data of 97 participants, as they had both correct and incorrect answers in the question-after-material condition. Fifth graders had an average completion time of 66.38 s (SD = 59.42 s) for incorrect answers and 47.19 s (SD = 32.83) for correct answers, whereas eighth graders had 69.36 s (SD = 62.45 s) for incorrect answers and 36.61 s (M = 17.19 s) for correct answers. We conducted a mixed ANOVA with accuracy (incorrect vs. correct) as within-subjects factor and grade (5 vs. 8) and school track (academic vs. non-academic) as between-subjects factors. As illustrated in Fig. 3, there was a main effect of accuracy, F(1, 93) = 19.15, p < .001, ηp2 = .17, indicating shorter completion time with correct answers than with incorrect answers. However, there was no main effect of grade, F < 1, and no significant interaction effect of accuracy × grade, F(1, 93) = 1.68, p = .20, ηp2 = .02. Interestingly, the effect of school track reached significance, F(1, 93) = 6.60, p = .01, ηp2 = .07, and so did the interaction of accuracy × school track, F(1, 93) = 8.01, p = .006, ηp2 = .08. It indicated that the lengthening of completion times for incorrect as compared to correct answers was reduced for non-AT students, while AT students to a larger extent differentially invested time in correct compared to incorrect answers. Descriptively, non-AT 5th graders invested similar amount of time on correct and incorrect answers (i.e., items difficult for them). No other effect was found, accuracy × school track × grade, Fs < 1. Thus, while the hypothesized accuracy × grade interaction was not obtained, school track seemed to be related to giving up early.

Fig. 3
figure 3

Completion time of academic track (AT) and non-academic track (non-AT) students from grade 5 (G5) and grade 8 (G8) for incorrect and correct answers

Question-before-material

We report data of 110 participants, as they had both correct and incorrect answers in the question-before-material condition. Fifth graders had an average task completion time of 106.64 s (SD = 52.79 s) for incorrect answers and 95.75 s (SD = 49.79) for correct answers, whereas eighth graders had 105.90 s (SD = 45.59 s) for incorrect answers and 66.23 s (M = 29.42 s) for correct answers. The corresponding ANOVA showed a main effect of accuracy, F(1, 106) = 20.55, p < .001, ηp2 = .22, indicating shorter completion time with correct answers than with incorrect answers. The main effect of grade, F(1, 106) = 4.52, p = .04, ηp2 = .04, and the interaction of accuracy × grade, F(1, 106) = 9.66, p = .002, ηp2 = .08, suggested that 8th graders invested relatively more time in items that were difficult to them (incorrect answer) rather than in less difficult items (correct answer), whereas 5th graders invested rather similar time no matter whether the answer was correct or incorrect. The main effect of school track, F(1, 106) = 4.29, p = .04, ηp2 = .04, and the interaction of accuracy × school track, F(1, 106) = 5.89, p = .02, ηp2 = .05, suggested that non-AT students differentiated less in time investment on items that were difficult for them (incorrect answers) than less difficult (correct answers), whereas AT students invested relatively more time on items that were difficult for them (incorrect answers) than less difficult (correct answers). It confirmed Hypothesis 1. No other effect was found, school track × grade, F(1, 106) = 1.86, p = .18, ηp2 = .02, accuracy × school track × grade, F(1, 106) = 2.44, p = .12, ηp2 = .02.

Survival analysis

The survival analysis (cf. Bacca-Acosta & Avila-Garzon, 2021; Wright, 2000) was adopted to chart where the survival rate (percentage of not-yet-completed) differed between grade 5 and grade 8 for incorrectly vs. correctly answered questions. In the current study, the longest reading time for a specific unit (e.g., Savannah) was 399 s among all the participants. This duration was divided into ten time bins with 39.9 s each. The chart thus plots what percentage of participants has not completed the task by the first, second… and tenth time bins of 39.9 s. It is important to note that we only report participants who had both correct and incorrect responses in one of the reading conditions.

Question-after-material

For AT students, the survival analyses showed a consistent pattern for both grades that across many bins more participants have not yet completed the task with incorrect answers compared to correct answers (see Fig. 4a). For instance, around 30% of 5th graders have not yet completed the task within the first two bins of 39.9 s with incorrect answers compared to around 10% for correct answers. Around 30% of 8th graders have not yet completed the task in the second time bin with incorrect answers compared to 0% for correct answers. For non-AT students, this pattern was different for 5th graders compared to 8th graders. The survival rate did not differ between incorrect and correct answers for 5th graders. Yet, while almost all 8th graders completed their correct answers within the first two bins, around 20% have not yet completed for incorrect answers.

Fig. 4
figure 4

Survival rates (i.e., how many percentages of participants have not yet completed the task) in the time course of 399 s (over 39.9 s bins) for academic track (AT) and non-academic track (non-AT) students from grade 5 (G5) and grade 8 (G8) in question-after-material (a) and question-before-material conditions (b)

Question-before-material

For AT students, the survival analyses again showed a consistent pattern for both grades that more participants have not yet completed the task with incorrect answers compared to correct answers (see Fig. 4b). Around 90% of 5th graders have not yet completed the task within the first two bins of 39.9 s with incorrect answers compared to around 60% for the correct answers. Around 70% of 8th graders took longer than two bins for their incorrect answers compared to around 40% for the correct answers. Suggesting premature termination of processing when facing difficulties, for non-AT students, the survival rate even seemed to drop more quickly for incorrect as compared to correct answers in 5th graders. For 8th graders, the usual pattern was present (i.e., faster decay of the rate of students who have not yet completed the task in case of correct compared to incorrect answer).

Fixation patterns on texts and pictures

As the above results suggest that premature responding is prevalent in some groups of students, further analyses explored how the usage of text and pictures would differ for early vs. late responders. Given that text and pictures ought to play different roles in the generation vs. adaptation of mental models, fixation patterns on text vs. picture might help to trace potential imbalances in usage of material over time that might be associated to erroneous responses.

A median split of early vs. late responders was used on completion time on each topic (e.g., Savannah) in each age group to separate between early and normal responders. To test Hypothesis 2, cases of 5th graders who completed reading early and had incorrect answers (G5-early-incorrect) were compared to participants in both grades who had normal completion time and incorrect answers (normal-incorrect) and participants in both grades who had normal completion time and correct answers in each reading condition (normal-correct).

The processing phases per participant and unit were divided into quintiles to capture the dynamics of processing the material. Each quintile determined the percentage of fixations on the picture and the percentage of fixations on the text. These did not add up to 100%, as participants could also fixate the item or empty space. Three groups were compared with each quintile as a row. The three groups (G5-early-incorrect vs. normal-incorrect vs. normal-correct) were the aggregated cases from 6 units (e.g., Savannah) with either correct vs. incorrect answers and with early vs. normal completion time. One participant could belong to more than one group. For instance, one participant from grade 5 answered the topic Savannah incorrectly with early completion time (G5-early-incorrect) and answered the other topics correctly with normal completion time (normal-correct). Therefore, three aggregated groups with each quintile as a row were compared.

Fixations on pictures

A 2 (condition: question-after-material vs. question-before-material) × 3 (group: G5-early-incorrect vs. normal-incorrect vs. normal-correct) repeated-measures ANOVA was performed on the percentage of fixations on pictures. The results yielded a main effect of condition, F(1, 4) = 8.31, p = .045, ηp2 = .68, indicating higher usage of pictures in the question-after-material condition (prior experience of the material) than in the question-before-material condition (see Fig. 5a). There was a main effect of group, F(2, 8) = 68.73, p < .001, ηp2 = .94. Post hoc comparisons using Tukey HSD (honestly significant difference) test suggested that the G5-early-incorrect group (i.e., 5th graders, early completion time, and incorrect responses) fixated less on pictures (M = 19.48%, SD = 9.44%) than the other two groups in question-after-material condition and question-before material condition, ps = .001. It confirmed Hypothesis 2a and Hypothesis 2b. There was no difference between the normal-incorrect and normal-correct groups (p = .99). The normal-incorrect group (i.e., individuals with both grades, normal completion time and incorrect responses) had on average 33.07% (SD = 12.43%) accumulated fixations on pictures. The normal-correct group (i.e., individuals from both grades, normal completion time and correct responses) had on average 33.34% (SD = 11.41%) accumulated fixations on pictures. There was no condition × group interaction, F < 1.

Fig. 5
figure 5

Percentage of accumulated fixation durations on picture (a) and on text (b) in five quintiles (20% of the reading time) in question-after-material and question-before-material conditions

Fixations on texts

The corresponding ANOVA performed on percentage of fixations on texts revealed the main effect of condition, F(1, 4) = 8.54, p = .043, ηp2 = .68, suggesting the lower usage of texts in the question-after-material condition (prior experience of the material) than in the question-before-material condition (see Fig. 5b). There was a main effect of group, F(2, 8) = 9.78, p = .007, ηp2 = .71 and an interaction effect of condition × group, F(1.02, 4.07) = 35.18, p = .004, ηp2 = .90, Greenhouse-Geisser correction. It suggested a larger effect of group in the question-before-material condition than in the question-after-material condition. The post hoc test suggested less fixations on texts in the G5-early-incorrect group (i.e., individuals with low reading prerequisites, early completion time, and incorrect responses) in the question-before-material condition (M = 33.19%, SD = 12.5%) compared to other groups [normal-incorrect group (i.e., individuals from both grades, normal completion time and incorrect responses): M = 56.66%, SD = 21.17%, p = .01; normal-correct group (i.e., individuals from both grades, normal completion time and correct responses): M = 57.79%, SD = 24.45%, p = .02]. It was consistent with Hypothesis 2b. There was no difference between normal-incorrect and normal-correct groups, p = .54.

Discussion

This study investigated the relationship between grade and multimedia learning, prior to correct vs. incorrect responses. Completion time and survival analysis were used to examine which group of participants tend to terminate prematurely. The fixation patterns of texts and pictures were used to indicate the reasons of premature termination. The study has two main findings that lower grade students tend to give up early and it is possibly due to the text-picture integration problems.

Individuals from low grade tend to give up early

Partially in line with Hypothesis 1, there was an interaction between grade (5 vs. 8) and accuracy (correct vs. incorrect) on task completion time in the question-before-material condition. Yet, such an interaction was not found in the question-after-material condition. This interaction provides insights concerning the (lack of) processing before attempting to answer the question. When the upcoming answer is incorrect, additional cognitive processing would have been needed to update and adapt the mental model to the task demands (Zhao et al., 2020b). The results suggest that 8th graders tended to be more patient and invested more extra reading time in apparently difficult items than 5th graders. In contrast, 5th graders tended to give up early when they did not know the answer. Interestingly, the analyses on completion time and survival rate revealed that non-AT 5th graders invested about equal time on items that were difficult for them (incorrect answers) compared to less difficult (correct answers), whereas AT students from both grades invested more time on items that were difficult than less difficult for them (see Figs. 3 and 4).

Motivation plays a critical role in task completion especially among learners from lower grades. When these learners lack motivation, they can feel helpless, which is a negative attitude towards self-control after repeated failures (Seligman & Maier, 1967). The most common phenomenon of feeling helpless is to attribute failures to external or internal uncontrollable causes, such as a high level of task difficulty or low competence (Cullen, 1985). Learners feeling helpless are less likely to take effort to try, despite opportunities are available to make a change. It is therefore unlikely for them to try an alternative solution when they encounter difficulties to solve the task. They come to believe that their failure to solve the task is due to uncontrollable reasons and nothing can be done to prevent the failure.

Younger learners tend to have limited knowledge and cognition serving their metacognition compared to their older peers (Flavell et al., 1970). Therefore, 5th graders are more likely to experience helplessness than their older peers in grade 8 do (Borkowski et al., 1990; Paris & Winograd, 1990). A previous study (Chan, 1994) examined the developmental pattern of attributional beliefs in reading comprehension on students from grade 5, grade 7, and grade 9 from 10 to 15 years old. The younger students in grade 5 were more likely to be observed with the patterns of feeling helpless. They attributed their failures to the lack of ability. In contrast, older students in grade 9 reported a rather stable concept of their ability. They tended to attribute failures to insufficient effort and ineffective strategy use. This should provide older students with stronger beliefs in personal control over the learning achievement through effort and strategy. Future work should, on the one hand, combine eye tracking in multimedia learning with motivational measures, to follow up on the potential interaction of motivational processes and multimedia learning. On the other hand, further studies should use the data on appropriate vs. less appropriate distribution of attention prior to answering-attempts to tailor feedback-based interventions. Moreover, they should assess whether these interventions indeed are suitable to (1) change how multimedia material is being processed and (2) lead to an attribution pattern focusing on internal and controllable factors (rather than stable and uncontrollable factors) with respect to success on multimedia items. Learners might experience that a large part of the success/failure when working on an item can come from using text- and picture materials in an appropriate order and with appropriate focus.

The correct upcoming answer suggests that the mental model includes the specific details required to solve the task. It is thus likely that less or no extra processing is required for the adaptive mental model specification. Learners from lower grades tend to spend more time in task completion than learners from higher grades do. This is mainly due to the low prior knowledge in reading and lower competence in understanding the task among 5th graders compared to 8th graders. The language literacy of 5th graders is still developing, and 5th graders still lack fluency with the syntactic and semantic processing (Hochpöchle et al., 2013; Schnotz & Wagner, 2018). For the same amount of information, 5th graders need much more time to process than their older peers do. The results also correspond to a speed-accuracy trade-off (cf. Kail, 1985). Lower grade students seem to compromise accuracy while prioritizing speed. They seem to answer prematurely rather than investing extra time in apparently difficult items.

Premature giving up can be due to inappropriate use of texts and pictures

In the question-after-material condition, learners have experienced the text and pictures once, and they have constructed the initial mental model before they encounter this condition (see Fig. 2). As the question is additionally shown in this condition, learners should mainly engage in adaptive mental model specification. In the question-before-material condition, learners have encountered questions at the beginning but still need to construct the initial mental model before it can be adapted to specific purposes. Learners should engage first in initial mental model construction and then in adaptive mental model specification. In accordance to previous studies (Schnotz & Wagner, 2018; Zhao et al., 2020b), texts and pictures accomplish different functions in their conjoint processing. Processing is shifted towards pictures during the adaptive mental model specification, whereas it is shifted towards texts during the initial mental model construction. In line with Hypothesis 2a, lower grade students in trials with early completion time and incorrect responses fixated less on pictures compared to learners with normal completion time in the question-after-material condition involving adaptive mental model specification. Consistent to Hypothesis 2b, lower grade students in trials with early completion time and upcoming incorrect responses fixated less on texts and on pictures in the question-before-material condition involving first initial mental model construction and later adaptive mental model specification compared to learners with normal completion time. It suggests that these learners have an inappropriate approach in multimedia learning. They have not realized the importance of cognitive functions of texts in the processing of initial mental model construction. In addition, they have not effectively used pictures in the processing of adaptive mental model specification. The result corresponds to the previous studies (Hochpöchle et al., 2013; Schnotz & Wagner, 2018), which demonstrated that older children are more capable of using texts and pictures adequately/effectively than younger children do. It is thus conceivable that besides motivational problems, younger learners have adopted an unfavorable approach in the processing of adaptive mental model specification. This may be the reason for their premature tendency to give up.

The results suggest that learners from low grade are prone to giving up early prior to incorrect responses, as they have not properly used texts in the processing of initial mental model construction and pictures in the processing of adaptive mental model specification. They may have not established the mental model well and could not therefore adapt it to the task demand. As their constructed mental model does not meet the task demands, these learners tend to give up early instead of investing more time to search for alternative strategies to solve the task.

Limitations

Based on eye tracking data, the current work shows that (un)successful processing of multimedia material is characterized by distinct patterns of text and picture usage among 5th vs. 8th graders. This provides the ground for future studies combining eye tracking with motivational and metacognitive measures and interventions (Riemer & Schrader, 2019). Furthermore, while the current study involved a measure of verbal ability, an assessment of reading ability that is independent of the multimedia task should be included in future studies.

This study had an unbalanced number of females and males in each grade. While analyses presented in the Appendix suggest that this was not driving the results, future research might take into account the potential moderating effect of gender on premature termination in multimedia learning, as females are at higher risk for helplessness than males (McKean, 1994). The impact of the complexity of the item (Hochpöchler et al., 2013) and how children perform after receiving negative vs. positive feedback (Craske, 1988) should be examined in future studies. The effect of text genre and previous text experience on completion rate and text-picture integration should be further examined, as this study used only scientific texts, whereas other text genres are also taught in the curricula of grade 5 and grade 8 (German Society for Geography, 2020; Kultusministerkonferenz, 2004). Also, the influence of picture type remains to be investigated, as the type and function of pictures (e.g., thematic maps, pie charts, Origami folding steps) may affect the construction of mental models and the findings of the experiment (Schnotz et al., 1999; Zhao et al., 2020c, Zelazny, 2006).

Pedagogical implications

This study suggests pedagogical implications in online and classroom learning environments. First, it provides a basis for automated feedback during the course of multimedia learning. When students from lower grades initiate to give an answer more quickly than the average answering time and the recorded usage of text and pictures seems inappropriate, the computerized learning systems should detect this discrepancy and provide immediate feedback to motivate these students to consider processing the material more thoroughly before entering an answer. Second, instructional design should be tailored to meet the needs of learners corresponding to their zones of proximal development (Vygotski, 1963). As students from lower grades are prone to give up early, the instructions for these students should be more encouraging and should provide suggestions to focus more on text or on pictures depending on different cognitive processing.

Conclusion

This study shed light on the processing of science multimedia materials in students from different grades and school tracks. On the one hand, students from lower grades and non-academic school tracks seemed to be more likely to give up early. On the other hand, early incorrect answers were characterized by an inadequate level of fixating on picture- and text-elements. While highly regulated learners might adjust their accuracy, pace, and metacognitive resources to the task, this might not be the case for less regulated learners. While looking patterns as well as early answers might serve as a starting point for computer-based interventions, future studies should also use supplementary information sources to further test these potential links between overt behavior, mental model construction and adaptation, and metacognition. For instance, thinking-aloud data (Hu & Gao, 2017) can help better explain why learners from lower grades give up early prior to incorrect answers. Children between ages 11 and 14 are undergoing their early adolescence with biological and cognitive changes and are developing a sense of self-esteem, competencies, and individuality (Eccles, 1999). It is thus essential to help them build confidence in themselves by motivating them to keep investing effort also in difficult and challenging tasks. We hope a deeper understanding of the relationship between grade and text-picture integration can eventually enhance teaching practices in multimedia learning.