There's more to the multimedia effect than meets the eye: is seeing pictures believing?

Textbooks in applied mathematics often use graphs to explain the meaning of formulae, even though their benefit is still not fully explored. To test processes underlying this assumed multimedia effect we collected performance scores, eye movements, and think-aloud protocols from students solving problems in vector calculus with and without graphs. Results showed no overall multimedia effect, but instead an effect to confirm statements that were accompanied by graphs, irrespective of whether these statements were true or false. Eye movement and verbal data shed light on this surprising finding. Students looked proportionally less at the text and the problem statement when a graph was present. Moreover, they experienced more mental effort with the graph, as indicated by more silent pauses in thinking aloud. Hence, students actively processed the graphs. This, however, was not sufficient. Further analysis revealed that the more students looked at the statement, the better they performed. Thus, in the multimedia condition the graph drew students' attention and cognitive capacities away from focusing on the statement. A good alternative strategy in the multimedia condition was to frequently look between graph and problem statement, and thus to integrate their information. In conclusion, graphs influence where students look and what they process, and may even mislead them into believing accompanying information. Thus, teachers and textbook designers should be very critical on when to use graphs and carefully consider how the graphs are integrated with other parts of the problem.


Introduction
Mathematical textbooks often include different forms of pictures (such as illustrations, graphs, diagrams, etc.). The reasons are twofold. On the one hand, teachers and textbook designers generally believe that pictures would be helpful for students to better understand the material. On the other hand, cognitive theories of information processing recommend to enrich scientific texts with pictures to support students in building a rich and coherent mental model of the subject matter (i.e., multimedia effect). However, there are two critical points to this view. First, theories underlying the multimedia effect make statements about perceptual processes that have not been verified directly. Second, recent empirical research questions this general beneficial effect of pictures and even suggests that pictures may bias people into being uncritical towards scientific texts (i.e., picture bias effect) (McCabe and Castel 2008). Thus, in this study, we (1) investigate the multimedia effect and its underlying cognitive and perceptual processes directly with think-aloud protocols and eye tracking and (2) we test the picture bias effect. Both investigations are in the context of mathematical education at a university level (vector calculus).

Basic assumptions of learning with multimedia
Material that presents information in different formats, such as text, pictures, diagrams, and formulae, is referred to as multimedia. Two leading theories describe how the human cognitive system processes multimedia material, namely the Cognitive Theory of Multimedia Learning (CTML;Mayer 2005a) and the Cognitive Load Theory (CLT; Sweller et al. 1998). Both theories assign a central role to working memory (Baddeley et al. 1992). They make three assumptions on the functioning of working memory.
First, for information to be learned and successfully stored, it has to be actively processed in working memory. Mayer (2005a) describes active processing in three steps: Information has to be selected from a source by means of attention to enter working memory. Next, this information has to be organized into mental models. Last, these mental models have to be integrated with each other and prior knowledge from long-term memory. Only information that has been processed in such a way can be stored in the long-term memory. The 'select' and 'integrate' processes refer to perceptual processes, however, these were only theoretically deduced, but not directly tested.
Second, working memory is of limited capacity, which must not be exceeded. Sweller et al. (1998) proposed that working memory capacity can be filled with three types of loads, namely load caused by active processing of the information (germane load), load stemming from the difficulty of the task (intrinsic load), and by load stemming from other unnecessary cognitive processes that do not contribute to executing the task at hand (extraneous load). The amount of cognitive load posed upon working memory (i.e., mental effort) can be measured with different methods, such as subjective rating scales (Paas 1992) or silent pauses in thinking aloud (Yin and Chen 2007;Jarodzka et al. 2015).
Third, both theories assume that two separate channels exist for processing verbal and pictorial information (Baddeley et al. 1992;Paivio 1986). Both channels are in earlier steps of processing information independent, hence they are loaded with information separately.
In later processing steps this information is integrated and leads to a richer mental model than when based on one modality (either pictures or words) only.
Based on these three assumptions both theories (CTL, CTML) provide guidelines on how multimedia material should be designed to optimize cognitive processing of information, as will be described in the next section. It is important to note that both theories and also their resulting guidelines refer to learning. Nevertheless, we argue that these theories as well as their resulting guidelines can be adapted to task performance without a specific learning intention, because they are built upon general assumptions of the human cognitive system: First, the assumption that the human cognitive system is limited in capacity with respect to how much information it can process at a time (not in long-term memory storage, though) dates back to early research on the structure and functioning of human working memory (e.g., Baddeley et al. 1992;Miller 1956). Hence, this assumption holds true not only for learning, but also for general task performance. Second, the active processing assumption of information is based on Atkinson's and Shiffrin's information processing model (Atkinson and Shiffrin 1968), which again is not a specific learning model, but instead describes general information processing. Thus, in this study we applied these principles to task performance i.e., solving a problem in vector calculus.
Guidelines for designing multimedia material: the multimedia principle One of the basic guidelines of the above mentioned multimedia theories (CTL, CTML) is the multimedia principle, which assumes that ''people learn better from words and pictures than from words alone (Fletcher and Tobias 2005;Mayer 2001a). The main idea is that text and pictures evoke different cognitive processes resulting in different mental models which, when later integrated, result in a richer mental model compared to one of the models alone. Moreover, when information is presented both in a pictorial and a textual manner, students can use both processing channels in parallel and more efficiently use their working memory. This enables an active processing of information.
A long history of research provides evidence for the multimedia principle (for instance, see the research conducted by the research group of Richard Mayer). Mayer (2001b) reports nine of his own studies, all with beneficial learning effects when pictures accompany text. Confirming this positive effect of pictures, Carney and Levin (2002) present a review with 18 articles from the 90ies reporting beneficial learning-effects of pictures accompanying texts.
In one of his articles, Mayer (1989) showed that learning about car mechanics improved when pictures were accompanied with text compared to text only (or pictures only). He explains the multimedia effect by the fact that such illustrations helped students to ''focus their attention'' and ''organize the information into useful mental models'' . However, these conclusions were not directly tested.
Thus, both theory and empirical research state that pictures accompanying texts in mathematical problem solving reduce mental effort and help the students to focus their attention, although these assumptions were often deduced from an improved task performance, but not directly tested.

Limitations and restrictions of the multimedia effect
Several empirical studies provide challenges for the multimedia principle. Often, students do not make use of pictures as was intended. For instance, Berends and van Lieshout (2009) found that school children do not benefit from pictures in mathematical problem solving as much as intended. The authors concluded that integrating two information sources probably required more working memory capacity than available (for similar findings in school exams see Crisp and Sweiry 2006). In line with these findings, Holsanova et al. (2009) found in a naturalistic newspaper reading study that if pictures and text are given in a standard format, i.e., where they are presented separately in a 'split' format, readers often do not make the effort to integrate these information sources (as shown by little visual integration between both information sources indicated with eye tracking). Thus, providing additional information in graphs-irrespective of whether it is relevant to the task-requires additional cognitive resources. If these resources are not available or not allocated correctly graphs can even be harmful.

Bias towards believing
Other researchers sees the additional use of pictures even more critically. Lenzner et al. (2013), for instance, showed that pictures reduce the perceived difficulty of a learning material. This could be very dangerous as students might put too little mental effort into understanding the text so that they do not process all information actively (i.e., select all relevant information from all possible information sources, organize it into coherent mental models, and integrate it), which in turn would result in a poorer task performance.
Other lines of research unrelated to learning or instruction also critically investigate the effect of pictures. Isberner et al. (2013) found that graphs increased the perceived plausibility of conflicting information in science text. Again, this is problematic as it could result in students overlooking logical flaws in a text and thus not being able to build a coherent mental model of the task at hand, again, resulting in poorer task performance. McCabe and Castel (2008) showed that the mere presence of an illustration increased the perceived credibility of a scientific text. The readers were less critical against the arguments of a scientific text, when it was accompanied by a scientific illustration. As with the other examples, this uncritical attitude towards a text prevents students from building a coherent mental model of its content. Therefore, they are not able to draw the correct conclusions from this mental model, when it has to be applied to perform a particular task. Hence, pictures that are of a scientific nature may easily be perceived as a proof of the accompanying text and mislead students into believing it-irrespective of whether they do add to its arguments or not. Only a careful integration of both information sources could prevent someone from making this mistake.

Vector calculus as an exemplary mathematical domain
In the present study the multimedia-and picture bias effects were investigated in the domain of vector calculus. We chose this domain for two reasons. First, vector calculus is a crucial foundation for studies in mathematics and is used in many branches of physics and engineering (for details on the Swedish curriculum in vector analysis see Griffiths 1999;Ramgard 1996;Persson and Böiers 1988). Second, vector calculus is a very visual topic where an abstract mathematical formula often can be accompanied with a direct graphical representation. One of the authors of this article has been teaching courses in vector calculus and has discussed the topic among several colleagues from different countries. It is a common belief among teachers we have talked to that a key to understanding vector calculus is to be able to switch between different representations of a problem, and successfully integrate the information from all representations into one coherent mental model. This is referring to a deeper form of understanding, necessary for instance to be able to apply relevant knowledge to new applications.

The present study
In this study we investigate whether we can find a general multimedia effect for mathematical problem solving in vector calculus by comparing problem solving tasks with and without accompanying graphs. An example problem is shown in Fig. 2. Furthermore, we test whether these graphs bias students into believing their accompanying texts by asking students to reject or confirm statements about the task. To better understand the processes underlying the multimedia effect, we use two process-tracing measures: eye tracking (Holmqvist et al. 2011) and verbal reporting (Ericsson and Simon 1993). Eye tracking tells us which areas students visually select information from and how they visually integrate these areas. Concurrent verbal reporting may provide insight into the amount of mental effort invested by students (Yin and Chen 2007;Jarodzka et al. 2015). Moreover, it can deliver qualitative information about the underlying processes by serving as a dual-task measure of mental effort (e.g., Brunken et al. 2003;Park et al. 2015). We hypothesize the following with respect to performance (H 1 and H 2 ) and processes (H 3a;b;c ).
H 1 Performance (i.e., correctly confirming or rejecting a problem statement) is higher with than without graphs, that is, we expect a multimedia effect.
H 2 Confirming the problem statement is more likely with than without graphs, that is, we expect a picture bias effect. As a results of the picture bias, we also expect higher performance in the multimedia condition when statements are to be confirmed, compared to the control condition (without a graph).
H 3 Students process information differently depending on whether a graph is present or not. In particular, we expect: H 3a If a graph is present, students search and select information from it. This shows in time spent looking at the graph. Furthermore, as we expect a multimedia effect, we consequently expect that search and selecting information from the graph is positively related to task performance. In addition, we explore to which extent this shift of attention towards the graph happens at the expense of the other information areas (text and formula input and problem statement) and to which extent attending to these is related to performance; we predict a higher performance the more the graphs are attended.
H 3b If a graph is present, students integrate information from it with information from other sources, such as the input (text and formula) and the problem statement. This shows in the amount of transitions between the graph and the other information sources. Furthermore, as we expect a multimedia effect, we consequently expect that integrating from the graph with other sources is positively related to task performance.
H 3c In problems with graphs students use more mental effort than in problems without graphs, because they need to process more information. This becomes evident in the overall proportion of silence calculated directly from the recorded sound file. A higher proportion of silence is predicted when graphs are present, as a result of the increased mental effort.
Moreover, as an open research question (RQ 1 ), we investigate in two contrasting cases the extent to which participants follow the processes predicted by the CTMML (search information, build a mental model, activate prior knowledge, integrate information, and form a problem solution). In addition, we investigate their meta-cognitive and off-topic statements.

Method Participants and design
Thirty-six students (three females) with an average age of 21.5 years (SD=3.0) took part in the experiment. They studied engineering physics (F) at the Lund Institute of Technology, and were 2 weeks into a basic course in vector calculus. Hence they should be considered as a fairly uniform population with respect to their study background. All students had normal or corrected-to-normal (i.e., with glasses or lenses) vision. They were randomly assigned to one of two conditions in a between-subject design: one solving eight problems without graphs (N ¼ 16), and one solving the same problems with graphs (N ¼ 20).

Stimuli
The stimuli consisted of eight problems dealing with basic concepts in vector calculus. They concerned, for example, simple cases of integration along curves in a two-dimensional domain, the interpretation of the gradient for a function of two variables, and Gauss formula in three dimensions. Each problem was composed of a text and a formula that described a general context, a problem statement that was to be confirmed or rejected and, in the multimedia condition, a graph. In this article the word graph is used in a broader context than it has in mathematics texts on, e.g., graph theory or functions, which we include, but are not restricted to. In three of the problems, the correct answer was to confirm the problem statement. In the remaining five, a rejection of the statement would provide a correct answer.
The graphs were designed by a lecturer in vector calculus to support students by describing a particular problem-related concept visually. In fact all the graphs used in the study could naturally be part of a textbook in vector calculus. Importantly, the students had not seen any of the problems before. They were interpretational in nature, and should therefore have a substantially positive effect on problem solving. All problems could be solved without having access to the graph. For example, many of the problems can be solved algebraically without using a mental geometric representation. Example graphs can be found in Fig. 1a, which shows level curves of a function of two variables and Fig. 1b, which depicts a curve that is restricted to a sphere. Note that the vectors are not labeled, so it was left for the students to identify and potentially use them when testing a particular solution strategy.
Each problem was saved as a grayscale png-image with a resolution of 1680 Â 1050 pixels. This resulted in a total of 16 stimuli images, eight for each group. The problems were presented in a random order.

Apparatus
The experiment was performed with a Dell laptop (Intel Core i7 CPU 2.67GHz, 2.98 GB RAM) running Windows XP Professional (v. 2002, SP 3). Stimuli were presented with Experiment Center (v. 3.0.128) on a Dell 22 inch computer screen with a resolution of 1680 Â 1050 pixels and a refresh rate of 60 Hz. Eye movements were recorded at 250 Hz with the RED250 eye tracker from SensoMotoric Instruments (Teltow, Germany) running iView X (v. 2.7.13). Data from the left and right eyes were averaged during recording, and therefore only one gaze coordinate represented the data for both eyes at each time instant.

Procedure
After an introduction to the experiment and after viewing an example problem not included in the actual test, participants were calibrated with a five point calibration followed by a four point validation of the calibration accuracy. Recalibrations were initiated when the operator-watching the eye image in iView X and the stimulus with overlayed gaze position in Experiment Center-judged that it was necessary. The average accuracy from all accepted calibrations reported by iView X was 0.5 (SD = 0.18) horizontally and 0.55 (SD = 0.29) vertically.
Each trial started with a centrally located fixation cross that was presented until the software detected a fixation within a 1 square centered on the cross. Then the problem appeared, and the participants were free to inspect the problem for a maximum of 120 s. If they felt that they were ready to provide an answer sooner, they could do so by pressing the Also corresponding perpendicular vectors, which are proportional to the gradient, are shown. b This graph, that was used in problem P3, had the strongest effect on performance of all the problems in our study. As is discussed in the text, it strongly supports one of the two major possible solution strategies used by the participants involving to depict the dashed vector (not shown for the students), and this solution strategy was very rare in the group not having access to this graph.
spacebar to answer two questions: first, participants were asked whether they thought the statement in the problem was true or false and, second, how certain they were in their answer on a scale from 1 (very unconfident) to 7 (very confident). Throughout the eyetracking experiment, participants were asked to verbalize their thoughts as they solved the problems, according to the methodology described in [Ch. 3] Holmqvist et al. (2011) concerning training, instruction, and prompting. Written consent was given by all participants, who got two movie theater tickets as a compensation for participating.

Data analysis
Fixations and saccades were calculated from raw data samples with BeGaze (v. 3.1 Build 152) using default settings.
Eye tracking data were analyzed by means of specific areas of interest (AOIs) which we defined for each problem. AOIs are coherent parts of the screen, for which eye tracking parameters were summarized. Figure 2 depicts a multimedia problem with input, problem statement, and graph, where AOIs are outlined by black rectangles and the name of the AOI is found in the upper left corner of the AOI. AOI names and rectangles were not shown to the participants. Specifically, we calculated total dwell time (sum of all time spent looking inside an AOI) from raw data samples and transitions between the AOIs from fixation and saccade data.
The proportion of speech was computed from the recorded speech signal, which was sampled at 44 kHz. A student was considered to speak when the amplitude (A) of the signal exceeded a threshold and when two consecutive speech samples above this threshold were located less than a given number of samples (n s ) apart. Limits for A and n s were set to 0.015 (relative intensity) and 440 samples (i.e., 10 ms), respectively. Every part of the speech signal that was not detected as speech by the above definition was considered to be ''silence''.
The recorded speech was further analyzed by first transcribing it to text format, and then coding it into 'idea units' according to the scheme in Appendix (Table 10). The main categories in the coding scheme are based on Mayer's CTML (2005b) and thus refer to the cognitive processes assumed by this theory: searching and selecting of information from input and graph, activating prior knowledge, integrating information from different sources, and the final problem solution. In line with van Gog et al. (2005), who also Fig. 2 An example of a stimulus (P7) used in the multimedia condition of the experiment, which has three overlayed areas of interest (AOIs): input, problem statement, and graph investigated cognitive processes involved in problem-solving, we included meta-cognitive processes. The actual coding was conducted by two raters for 10 % of all data. Their interrater-reliability was above 70 %, calculated as the number of matching codes with respect to the total number of codes in this 10 % of the data. Since the inter-rater reliability was sufficiently high (i.e., higher than 0.70, van Someren et al. 1994), one of the raters coded the remaining data.
Data were analyzed with linear mixed effects models using R 2.15.2 (R Development Core Team 2008) and the packages lme4 (Bates et al. 2012) and languageR (Baayen 2011). Participants and problems were modelled as random factors in all analyses.

Results
The results are presented in the order of the hypotheses in Sect. 1.5.
Participants solving problems with graphs answered correctly to 56 % of the problems compared to 52 % for participants without graphs. Table 1 shows the result of a multi-level logistic regression predicting a correct answer based on the presentation condition. As can be seen from the table, there is no statistically significant effect of presentation condition on students' abilities to answer the problems correctly.

Picture bias (H 2 )
To test whether there was a confirmation bias when graphs were present, information about whether the correct answer is true or false was included in the regression. The output can be seen in Table 2.
The analysis reveals that participants were more likely to answer the problem statement correctly if it is true and, interestingly, there is a significant interaction between presentation condition and whether the answer is true or false. As illustrated in Fig. 3, it appears as if the students were more likely to answer correctly when the answer was true and a Here 'withoutgraph' refers to the problems without graphs. The sign of the 'Estimate' tells us that the condition with graphs led to a higher proportion of correct answers. However, the effect is not significant since the value of 'Pr([jzj)' is above 0.05 graph was present, compared to when the answer was false. On the contrary, whether the answer was true or false had no influence when the graph was not present. A post-hoc multiple comparison 1 revealed only one marginally significant difference, which occurred between the two conditions when the answer is true (p ¼ 0:056). Additional support for a picture bias is provided in Table 3, which shows that presentation condition is a significant predictor for providing a confirmatory answer.
Since the nature of the answer (true or false) turned out to significantly predict the proportion of correct answers, this predictor was included in all further statistical models.

Search and selection (H 3a )
The overall small effect graphs had on comprehension raises the question of how the students utilize the additional graphical information. Given the similar performance results, it is tempting to believe that they did not spend much time on the graphs but, as in the nonillustrated condition, inspected only the text and the equations in the input and problem statement areas. At the same time, the interaction between whether the answer was true or false and the presentation condition (with or without graph) suggests that the graph influenced the students' problem solving processes.
Overall, the students spend a fairly large proportion of the total time viewing the graphs (19.0 ± 10.2 %). As can be seen from Fig. 4, it appears as if the graph is inspected at the  Fig. 3 Illustration of the interaction between whether the problem statement is true or false in the multimedia (with graph) and control (without graph) conditions. Error bars represent standard errors expense of the input and the problem statement, in such a way that equal amounts of time is taken from each of these regions. The proportion of total dwell time on both the input and problem statement was significantly shorter when the graph was present, according to a two sample t test (p\0:001). Moreover, a similar test between the quotients of 'input' and 'problem statement' for problems with and without graphs did not come out significant (p [ 0.05) for any of the problems. Given that we know that a significant portion of time is spent visually inspecting the graph, does a longer inspection time also lead to better performance? On average, there were small differences in total dwell time on the graph when answering correctly (M ¼ 19:9; SD ¼ 11:0 %) compared to incorrect answers (M ¼ 18:6; SD ¼ 10:8 %) and, as seen in Fig. 5, there was no relationship between whether participants answered correctly and how much time they spent looking at the graph. This is confirmed statistically by the results reported in Table 4.
Previous research has shown that a good problem solving strategy is to read the problem formulation carefully, before moving on to other parts of the problem Andrà et al. (2009). However, Fig. 6 shows that performance is inversely proportional to the proportion of dwell time on the input area in a problem. Students who answered correctly looked at the input 42.6 % (SD ¼ 11:1) of the time whereas those who answered incorrectly spent 47.0 % (SD ¼ 12:1) of the time inspecting the input. A smaller proportion of dwell time on the input significantly predicts an increase in performance (cf. Table 5).
As shown in Fig. 7, the students dwelled proportionally longer at the problem statement when giving a correct answer (M ¼ 42:6; SD ¼ 12:8 %) compared to an incorrect answer (M ¼ 38:3; SD ¼ 11:3 %), and the total dwell time on the statement was a significant predictor for a correct answer (cf. Table 6).

Integration (H 3b )
It could be that a long dwell time on the graph by itself does not help students' problem solving, but rather how they integrate the graph with other parts of the problem, i.e., the regions labeled as input and problem statement before. Figure 8 illustrates how performance is related to the number of transitions between different areas in the problem.
As shown in Table 7, there is a marginally significant effect (p ¼ 0:08) that the number of transitions between the graph and the problem statement were higher for students that answered a problem correctly. No significant differences were found for the other transitions in Fig. 8. To estimate whether the illustrated problems required more mental effort, the proportion of silence was calculated from the verbal data. Figure 9 shows that participants consistently speak less when the problem includes a graph; the proportion of silence increases from 62.3 % (SD ¼ 7:0) to 66.3 % (SD ¼ 9:1) for participants in the multimedia condition. As shown in Table 8, there is a marginally significant effect of presentation condition on the proportion students speak (p ¼ 0:07).   The proportion of dwell time on the graph is included as a factor. Note that the proportion of dwell time is logit-transformed before being used in the model

Results verbal data -two contrasting cases (RQ 1 )
In this section we compare the two most extreme cases from our experiment with respect to verbal data:problem P3 for which the presence of a graph improved the results the most, and problem P4 for which the results for the group having access to the graph was the Each Â corresponds to one of the 36 participants, and the line represents a linear fit of the data The proportion of dwell time on the input is included as a factor. Note that the proportion of dwell time is logit-transformed before being used in the model Here we report analyses of verbal data based on the coding schema described in Appendix (Table 10). Table 9 shows the normalized frequency of each code in relation to the two The proportion of dwell time on the problem statement is included as a factor. Note that the proportion of dwell time is logit-transformed before being used in the model Correct answer Incorrect answer  The number of transitions between graph and problem statement is included as a factor. Here '.' indicates that an effect is marginally significant contrasting problems. In addition to the verbal analysis, we report how confident students were in their answers.

Effect of graph presence
To estimate the effect of the graph, we have calculated the difference between the codes in the multimedia and the control condition across both problems. Then, we picked the ten largest differences between these. Four of these differences were related to the graph. In that, we found that when a graph was present participants selected more information from the graph (1.25), they integrated more information from the graph with the statement (0.50) and with the input (0.90). Furthermore, they built more mental models based on the graph (0.65). Hence, the participants made more active use of the graphs. However, the presence of a graph did not only influence its use, but also the use of the other problem elements. In that, we found positive and negative influences of the graph. On the positive side for the performance in the multimedia condition, participants selected more information from the statement when a graph was present (0.66). Since we found that a proportionally longer dwell time on the statement was related with higher performance, more information selection from the statement can be seen as a positive effect of the graph. Moreover, the participants evaluated to a higher extent whether or not the statement (0.29) and the input were correct (0.58). At the same time, they used less prior knowledge (-1.09) and they integrated information with the input and the statement less frequently (-0.50), which in  turn probably is a negative effect of the graph. Furthermore, participants evaluated their own knowledge more positively (0.31), which may also be problematic. In summary, adding a graph seems to have both positive and negative effects on the processes underlying problem solving.

Effect of helpfulness of the graph
To further investigate the effects of a graph, we compared the use of graphs for two contrasting cases: when the graph was most helpful and when it was most harmful. Therefore we calculated the difference between the two problems in the multimedia condition. Again, we chose the ten biggest differences. Three of these differences were directly related to the use of graphs. We found that when the graph was helpful, it was Fig. 10 Two contrasting cases. a P3, including the graph that improved the performance the most, and b P4, including the graph that helped the least. For improved readability, the text in the stimuli has been reproduced in the figure captions  Each number represents the number of codes for all participants divided by the number of participants. 'Diff' represents the difference WithGraph-WithoutGraph for Problem 3, Problem 4, and Problem 3 ? Problem 4. The numbers in the last column represent differences between the two problems in the multimedia condition (with a graph). For definitions and explanations of the codes, cf. Appendix. Color-coded codes represent the largest differences between the multimedia and control condition. Blue color indicates differences unrelated to graphs whereas red color indicates differences related to graphs. To make the table more readable and compact, WithGraph is denoted 'Graph' and WithoutGraph is denoted 'noGraph' selected to a higher extent (0.55), more integrated with prior knowledge (0.5), and participants built more mental models from it (0.55) compared to when the graph was harmful. Hence, participants made a more active use of the graph, when it was helpful. Moreover, we found also impacts of the graph on processing the other information sources: when the graph was helpful, participants selected less information from the input (-2.8) and from the statement (-3.6), integrated less information from the statement and the input (-0.35), but built more mental models based on the input (0.35). Thus, when the graph was helpful, participants extracted less information from other sources, but still used these more actively. Moreover, when the graph was helpful, participants evaluated their own knowledge more-both in positive (0.3) and negative (0.6) terms. Furthermore, we conducted a more qualitative analysis of these two contrasting cases. Problem 3: When the graph was most helpful. By quantifying how often the keyword sphere (or circle) occurs in the verbal data, it seems as if the graph [see Fig. 1(b)] directly supported the most common way of solving this problem. That is to mentally picture the dashed arrow in Fig. 1(b) (or the opposite oriented counterpart), resulting from the vector R t ð Þ moving along the trajectory, and to finally recognize that the dashed vector is tangential to the sphere and hence orthogonal to its radius. In the multimedia condition 50 % of the participants used the keyword while reasoning about the problem, and 90 % of these gave the correct answer 'true'. In the control group only 6 % uttered the keyword.
The confidence scores for problem P3 support the view that a majority of the participants who answers true actually has solved the problem correctly; among the students having access to the graph, there was a higher confidence (M ¼ 5:3) for students who answered true, which is the correct answer, than for those who answered false (M ¼ 4:3). Similarly, for the group not having access to the graph, the confidence (M ¼ 4:6) was also higher for those who answered the problem correctly than those who did not (M ¼ 4:0). In summary, it seems that participants in the multimedia condition to a large extent actively used the graph to solve this problem, and were also confident about their solutions.
Problem 4: When the graph was most harmful. The graph in P4, related to Gauss formula, illustrates that material being created within a volume is equal to the flow of material through the boundary of that volume. Hence, this interpretation of the Gauss formula is expected to be clearer for the group having access to the graph. In the verbal analysis we found that 25 % of the participants in the multimedia condition commented on this interpretation, while in the control group this number decreased to 19 %. More interestingly, we found that 60 % in the multimedia condition said (something similar to) this must be correct, while in the control condition such statements were uttered only by 19 % of the participants.
Turning to the confidence scores we found that, for students in the multimedia condition, there was a higher confidence (M ¼ 4:4) among students who answered true, which is the wrong answer, than for those who correctly answered false (M ¼ 3:0). On the contrary, in the control condition, the confidence (M ¼ 4:6) was higher for those who answered the problem correctly (i.e., false), than those who did not (M ¼ 4:0).
Taken together, participants seems to be more likely-and confident-to confirm a statement, when a difficult problem is accompanied by a graph.

Discussion
In this study we investigated the multimedia effect in problem solving at the university level with examples taken from the field of vector calculus. We found no support for an overall multimedia effect (H 1 ). Instead, graphs had a beneficial effect on performance only when problem statements were to be confirmed (instead of rejected), which is referred to as the picture bias effect (H 2 ). With respect to H 3 , analyses of eye movement data showed that the graphs attracted students' visual attention at the expense of fewer looks toward other parts of the problem. Moreover, spending a proportionally long time inspecting the problem statement as well as frequently moving the eyes between the graph and the problem statement correlated with a higher performance.
Finally, analyses of verbal data provided further insights into why graphs can be both helpful and harmful. It was hypothesized (H 3 ) that the students would actively use the graph in terms of utterances relating to the graph (H 3a ) and integration between the graph and other information sources (H 3b ). Results showed that when a graph was present participants indeed made active use of it, both in terms of selection and integration, and even more so when the graph was helpful. Interestingly, the presence of the graph also influenced the use of other information sources: participants made more use of the statement and evaluated the other data sources more. When the graph was particularly helpful, participants made a more focused (i.e., selecting less information from), but at the same time more efficient use (i.e., building mental models) of the other data sources. Moreover, with a graph, participants evaluated their own knowledge as being higher, confirming a picture biasing effect. When the graph was particularly helpful, though, they reflected more on their own knowledge. Finally, there was a systematic increase of silence in the multimedia condition (H 3c ), suggesting that students use more mental effort when solving problems that contain graphs.
Beneficial or biasing picture effect?
The graphs we used in the current study were designed to fulfill an interpretational function, that is, to represent complex information presented in text or formulae pictorially, and thereby support students' problem solving processes Levin et al. (1987). We therefore expected to find an overall beneficial effect of adding graphs to problems, but no such effect was present in our data. From a theoretical point of view, the stimuli used in this study were designed in line with the temporal contiguity principle, that is, that pictorial and the textual material were presented at the same time. However, it is not fully in line with the spatial contiguity principle Mayer (2005b) (also known as the split attention effect, Chandler and Sweller 1992). The graph and the explanatory input were given on different parts of the screen and hence might have caused unnecessary visual search of related information, and therefore the absence of a multimedia effect. Split attention may have resulted in that the students invested more mental effort into integrating different parts of the problem, as suggested by the higher proportion of silence in the multimedia condition.
An alternative explanation for not having found a multimedia effect is that participants did not process the textual information, in particular the formula, in the phonological channel. In this way, they would have bypassed the benefits of the dual-processing assumption in working memory. However, post-hoc inspections of the eye-tracking recordings accompanied by the verbal reports of the participants, showed that a vast majority of the participants verbally described what the mathematical formulas contained; many even read the formulas out loud. Consequently, it is likely that most participants indeed processed the textual information phonologically. Nevertheless, future research should explore when and under which circumstances textual information is actually processed phonologically.
Our results suggest that when seeing a graph, students are more likely to believe in the correctness of the accompanying statements. Students may recognize parts of the input and the graph, and parts (maybe only keywords) of the problem statement and they then say something like ''yes this is [for example] the triangle inequality, so this must be true''. These results are in line with McCabe and Castel (2008), who found that including brain images in an article increased the scientific credibility of the results. They argue that this may be because the brain images ''provide a physical basis for abstract cognitive processes''. In this study, the graphs rather provide concrete physical interpretations of abstract mathematical formulae. Still, the graphs seem to have a similar persuasive power to affect whether a statement is believed or not.

Processes underlying text-picture integration
An important aspect of processing multimedia material is to select and integrate information relevant for the task (Mayer 2005b). We used eye tracking to investigate how information was visually selected, i.e., where the students looked, for how long, and how information was integrated, that is, how often they transitioned between different problem areas. When the graph was present, students spent about 20 % of their time looking at it. As a result, they paid proportionally less attention to the input and the to-be-confirmed or rejected problem statement. The proportion of time looking at the graph was not related to performance. Interestingly, the more students looked at the problem statement, and the less they looked at the input, the better they performed. Furthermore, the more students switched their attention between the problem statement and the graph, the better they performed. Thus, the mere presence of a graph that is related to the input is not necessarily helpful. Instead, the graph needs to be integrated with the to-be-confirmed or rejected statement.
Analyses of verbal data revealed that in the multimedia condition participants were often more silent in comparison to the control condition without graphs. As silent pauses are indicators of increased mental effort (Yin and Chen 2007;Jarodzka et al. 2015), adding graphs to these problems could have increased the amount of mental effort for students. One explanation to this is the fact that the amount of elements in the task increased (i.e., the intrinsic load). A qualitative analysis of two contrasting problems revealed that in the problem where the graph was beneficial it provided students with a representation that was helpful to solve the problem. In the problem where it was most harmful, the graph itself was correct, but the problem statement was not. Still, the graph convinced the students to confirm the statement.

Implications for theory and educational practice
As a practical consequence, we can conclude that when including graphs in textbooks, it should be ensured that students first and foremost know exactly what their task is (here: confirm or reject problem statements) to know how to use these graphs. Next, they should always ensure to keep the task itself in mind by integrating the task formulation and the graph. Thus, when designing textbooks, it could be important to consider these integration processes. Future work should investigate different way to facilitate integration by e.g., referring to the graph in the statement and maybe even back from the graph to the problem statement.
Furthermore, implications can be also drawn for theory. Mayer (2005b) theory of processing information of multimedia clearly describes an optimal scenario, where students actively process all given information, by selecting the relevant information, organizing and integrating it. However, in line with other research (e.g., Holsanova et al. 2009), our study showed that students may simply not take the effort to actively process information and instead use a rather shallow processing strategy (e.g., assuming that when the graph is correct, the rest of the task must also be). In that, pictures could even support such a shallow and misleading processing. The CTML does acknowledge that this optimal way of processing can be hampered by different layout decisions and has thus formulated several design guidelines. Based on the findings in this paper, we suggest that the influence of a picture bias effect should be considered carefully alongside such guidelines.

Limitations and conclusions
It is evident from discussions we had with students after the test, that the experiment does not precisely reflect how they normally work with problems of this type at home, in the classroom, or at examinations. First, the time to solve a problem was limited and rather short. Such time pressure may lead to more shallow information processing, and therefore a greater picture bias. Second, they were not allowed to use pen and paper to scribble formulas and figures to organize their problem solving processes. Finally, these students are typically not exposed to problems where statements need to be falsified, in particular when the information is not presented in their native language. The implications of using this rather uncommon format for providing the answer need further investigation. This makes it challenging to construct suitable problems and graphs for these types of studies. Nevertheless, the format of the test is still common in other domains.
From the eye-tracking data and the verbal reports, examples of deep processing, such as building a rich mental model, of the information included in the problems were observed. However, the current data analysis does not allow for concrete evidence. Future research should investigate this issue in a qualitative manner.
In summary, graphs were not found to be beneficial per se in the experiment. Only when they were carefully framed and integrated with the problem statement they had a beneficial effect on performance. Otherwise, when the graphs were correct by themselves, they mislead the students to trust the problem statements. Either way, the graphs produced an increase in mental effort. Before including graphs in mathematical texts, teachers and textbook designers should very carefully consider their function and how they integrate with other parts of the information in the problem.