Is too much help an obstacle? Effects of interactivity and cognitive style on learning with dynamic versus non-dynamic visualizations with narrative explanations

The aim of this study was to investigate the role of visual/verbal cognitive style and interactivity level in dynamic and non-dynamic multimedia learning environments. A group of 235 biology students learned about photosynthesis either from a computer-based animation or a series of static pictures with spoken explanatory text. Participants were randomly assigned to one of two conditions: with or without the possibility to pause, to play, or to fast-forward/rewind the learning environment (self-paced versus system-paced condition). Participants obtained better results when learning with the system-paced environment than with the self-paced one. A significant triple interaction between cognitive style, type of pacing, and type of visualization showed that highly developed visualizers learned poorer with self-paced static pictures than with system-paced static pictures. There were no significant effects regarding verbal cognitive style. Results shed more light on the relation between different levels of interactivity and visual cognitive style, when learning from static pictures.

Contemporary research on learning with multimedia environments shows that the design of multimedia materials has an impact on learning outcome (e.g., Mayer 2017). Among other factors influencing the learning process, type of visualization (dynamic versus nondynamic visualizations: e.g., Höffler and Leutner 2007;Tversky et al. 2002), type of interactivity (system-pacing versus self-pacing: e.g., Hegarty 2004;Schwan and Riempp 2004), or type of modality (written explanatory text versus spoken narration: e.g., Mayer 2008Mayer , 2014 have been shown to have an impact. The latter effect demonstrates that using spoken narration facilitates learning from multimedia environments. It is, however, dependent on the type of interactivity/pacing (Tabbers 2002): If the pacing of the animation is determined by the system, an auditory narration leads to higher learning results than written text; but if the learner controls the pacing by him-/herself, it is the other way around. Additionally, a study by Höffler and Schwartz (2011) showed that self-pacing is more beneficial when learning from animations but system pacing is more favourable for learning from static pictures.
Simultaneously, the body of research on the role of individual differences in learning with multimedia environments continues to grow (e.g., Höffler and Leutner 2011;Huk 2006;Kalyuga 2008;Koć-Januchta et al. 2017). Since multimedia learning involves both visual and verbal processing of information, visual-verbal cognitive style may be a possible influencing factor in multimedia research. Although the results of research on visual-verbal cognitive style are rather inconsistent and have been criticized in the past (cf. Kirschner and van Merriënboer 2013;Massa and Mayer 2006), recent findings from eye-tracking studies seem to confirm the existence of cognitive styles at the very least. These studies have shown that participants who differed in their self-described visual and verbal cognitive style also differed in their eye-movement patterns when viewing pictorial and textual stimuli Koć-Januchta et al. 2017;Mehigan et al. 2011;Tsianos et al. 2009). According to the eye-mind assumption (Just and Carpenter 1987), measures derived from eye-tracking can be linked with cognitive processes such as attention and comprehension (Duchowski 2007;Scheiter et al. 2017). Hence, observed differences between people with a more or less developed visual or verbal style in viewing multimedia stimuli may reflect differences in their information processing. Moreover, research provides some evidence that people with different cognitive styles differ in terms of neurophysiological patterns (e.g., Jawed et al. 2018;Kraemer et al. 2014Kraemer et al. ,2009. The aim of the current study was to have a closer look at the possible role of visualverbal cognitive style on the above mentioned well-investigated factors influencing the learning process in multimedia learning: Type of visualization, type of interactivity, and (to a lesser extent) type of modality. Thus, we examined cognitive style's impact on different types of visualizations (static pictures, animations) with different levels of interactivity (self-pacing, system-pacing) while controlling for type of modality (same explanatory text in all conditions).

Learning with multimedia
Many empirical studies have confirmed the beneficial effect of learning with multimedia representations (e.g., Carney and Levin 2002;Clark and Paivio 1991;Mayer 2008Mayer , 2014Mayer , 2017Wittrock 1978Wittrock , 1989. Learning from a combination of pictures and texts leads to better comprehension than learning from words alone (Mayer 2014). In the series of studies reviewed by Mayer (2003), the multimedia effect has been shown across different learning environments (dynamic and non-dynamic). The results have contributed to three basic assumptions of the Cognitive Theory of Multimedia Learning (Mayer 2014(Mayer , 2003Mayer and Moreno 2003): (1) The dual channel assumption states that people process information in two different mental channels (systems): visual/pictorial and auditory/verbal. This assumption is based on the dual-coding theory (Paivio 1978(Paivio , 1986. According to this theory, one channel processes information presented verbally (e.g., text), the other channel processes information presented non-verbally (e.g., pictures). Additionally, the dual channel assumption refers to Baddeley's model of working memory (Baddeley 1992(Baddeley , 2003 which distinguishes between information processed initially through the eyes (e.g., written text, pictures) or through the ears (e.g., spoken text; cf. Mayer 2014). The cognitive theory of multimedia learning combines these two approaches and states that information can be presented to a learner visually (e.g., pictures) or auditorily (e.g., narration) and is organized into a verbal or non-verbal/pictorial model in working memory (Mayer 2014).
(2) The limited capacity assumption states that the cognitive capacity of these two channels (systems) is restricted (Baddeley 1992(Baddeley , 2003Chandler and Sweller 1991;Paas and Sweller 2014). That is, the amount of information a person can cognitively deal with in each of the channels at one time is limited (cf. Mayer 2014). (3) The active learning assumption stating that for meaningful learning to occur learners should actively process information presented to them: selecting relevant information, organising it into pictorial and verbal representations and integrating these representations with prior knowledge (Mayer 2003(Mayer , 2008(Mayer , 2014.

Dynamic and non-dynamic multimedia representations and level of interactivity
The results of a meta-analysis conducted by Höffler and Leutner (2007) support the assumption that animations are more beneficial to learners than static pictures (the effect size on learning outcome was d = 0.37). However, many studies do not seem to confirm the superiority of animation over static pictures (e.g., Lewalter 2003;Mayer et al. 2005). A narrative review performed by Tversky et al. (2002) did not show any systematic advantage of animations. Animations may even have some disadvantages from the cognitive load theory point of view. The cognitive load theory (Sweller 1994; van Merriënboer and Sweller 2005) is a model for the mental effort linked to the learning process (Paas et al. 2004). The theory distinguishes three types of cognitive load: intrinsic, germane, and extraneous. Importantly, the sum of these three types of load may not exceed the working memory capacities/resources of learners. While intrinsic and germane loads are unavoidable costs of meaningful learning, the extraneous load results from the way information is conveyed. It can and should be reduced in order to free resources. Hence, when learning with animations, due to their transient nature, they could possibly cause an increase in extraneous cognitive load, even a "cognitive overload" (cf. Just and Carpenter 1987;Hegarty 2004;Kalyuga 2008). Cognitive overload in multimedia learning can be prevented by following several guidelines (cf. multimedia learning principles) concerning the manner in which multimedia environments are constructed (Mayer 2008(Mayer , 2017. One of them, 1 3 the modality principle, states that learning from pictures or animation with spoken narration instead of written explanatory text fosters comprehension. Processing pictures or animation and on-screen visual text at the same time can cause cognitive overload in learners, as both pictures/animation and written text are processed (in the case of text at least initially) in the learners' visual channel (Mayer 2008). Learning from pictures or animation and spoken narration is less cognitively demanding and may minimize cognitive load, since pictorial information is processed in the visual channel while words are processed in the auditory (verbal) channel. Moreover, combining pictorial information and spoken narration may enable a more efficient use of working memory resources, again because narration and pictures/animation can be processed simultaneously in two separate channels (Baddeley 1992;Mayer 2008;Mayer and Moreno 2003;Mayer 1999, 2002). However, other authors suggest that the main reason for the modality effect is not the processing in two separate channels but rather the easier visual presentation, as the learners do not have to split their attention between the pictures and the text (Liu et al. 2019;Tabbers 2002). In any case, the combination of narration and animation/pictures seems to foster learning because of less mental effort (cognitive load) and a more efficient usage of working memory resources (simultaneous processing of both pictorial and textual information).
Another way to reduce mental effort is to provide learners with a possibility to control the pace and sequence of learning the information (Tabbers and de Koeijer 2010). Selfpaced environments provide learners with the opportunity to adjust the presentation speed to their needs and, for example, stop, rewind or fast-forward more complicated parts of the learning environment in order to study it more closely (Schnotz and Lowe 2008). The opposite of self-pacing is system-pacing, in which the speed of information presentation is pre-set in the system (cf. Lawless and Brown 1997). Studies have shown that systempaced learning environments are cognitively demanding, since preventing re-inspection of learning material can inhibit comprehension (e.g., Hegarty 1992Hegarty , 2004Mason et al. 2013) and cause learners to overlook important information (Ainsworth and Van Labeke 2004). Additionally, a large body of evidence shows a beneficial impact of self-pacing when learning with computer-based environments (e.g., Höffler and Schwartz 2011;Mayer and Chandler 2001;Schwan and Riempp 2004;Tabbers and de Koeijer 2010). For example, there is some evidence that self-pacing promotes the generation of mental models (Schnotz and Lowe 2008).
On the other hand, interactivity may sometimes induce cognitive overload and, as a result, inhibit learning (Moreno and Mayer 2007). For example, some researchers have argued that in the case of self-paced design, the range of implemented interactivity must be carefully considered since an interface that is too complicated or comprises too many interactive options can lead to cognitive overload (cf. Chandler 2004;Sweller et al. 1998;Scheiter and Gerjets 2007).
Some new light was shed on interactivity by Tabbers et al. (2001), who compared system-paced and self-paced multimedia learning environments accompanied by either written explanatory text or spoken narration. They showed that the most beneficial combinations of pacing and modality for learning are spoken narrations with system-paced learning materials as well as written explanatory texts with self-paced learning materials. Comparable results were reported by Tabbers (2002) who showed that students did better on a transfer test when learning either from system-paced multimedia instructions with a spoken narration or a self-paced multimedia instruction with written explanatory text. Additionally, learning with spoken narration, in comparison to learning with written text, resulted in lower cognitive load.

Visual-verbal cognitive style in multimedia learning
The efficacy of type of design and the interactivity level has also been considered in light of different learner characteristics. It could be shown that the design of multimedia materials and learner characteristics indeed interact with each other (cf. Höffler and Schwartz 2011;Scheiter and Gerjets 2007). Given that multimedia learning environments consist of pictorial and textual components and thus directly relate to visual and verbal cognitive processing (cf. Mayer 2008;Paivio 1986), the impact of the visual-verbal dimension on learning with multimedia is of compelling interest. Research related to visual-verbal cognitive style indicates that some people are better in using the visual channel and tend to think in pictures (visualizers), while others are better in using the verbal channel and tend to think in words (verbalizers; Mayer and Massa 2003; also see: dual-coding theory, Paivio 1978Paivio , 1986. Some studies have confirmed the relation between a visual-verbal dimension and learning outcome in a multimedia environment. Plass et al. (1998) showed that visualizers profited more than verbalizers from pictorial explanations to a text, while verbalizers profited more from textual explanations. Riding and Douglas (1993) reported similar effects. Recent eye-tracking studies showed that visual-verbal cognitive style manifests in different patterns of viewing multimedia stimuli (cf., Koć-Januchta et al. 2017;Mehigan et al. 2011). For example, in the study of Koć-Januchta et al. (2017), visualizers spent a significantly longer time inspecting pictures while verbalizers mostly viewed the text. Additionally, the latter shifted their attention towards irrelevant picture components sooner than the visualizers did and scored poorer than visualizers on a comprehension test. On the other hand, in the study by Massa and Mayer (2006), visualizers and verbalizers did not differ on learning outcome.
The issues of visual-verbal cognitive style in relation to type of visualization (animation versus static pictures) and system-versus self-pacing were also addressed in a study of Höffler and Schwartz (2011).Visualizers performed better when learning with animations than when learning with static pictures. This finding was somewhat contradictory to other findings (cf., Höffler et al. 2010), where static pictures were more beneficial for visualizers than animations. The crucial difference between both studies might have been a difference in modalities: The first mentioned study used spoken narrations for both animation and static pictures, while the latter relied on written text.
Other results from the first study (Höffler and Schwartz 2011) showed that self-pacing was especially beneficial to learners when learning with animations, but the system-paced condition resulted in better learning outcome when learning with static pictures. The cognitive load scores were larger in system-paced/animation and self-paced/static picture conditions, respectively. However, the authors did not obtain a triple interaction (type of pacing × type of visualization × visual-verbal cognitive style).

Objectives of the study
The main objective of the study was to elaborate the findings of Tabbers (2002), Tabbers et al. (2001), and Höffler and Schwartz (2011) regarding the types of pacing and visualization, modality, and visual-verbal cognitive style. In order to do so, we developed a 10 min multimedia lesson designed with spoken narrations-similar to a real learning situation and addressing a complex topic-that aimed to answer the following research questions: • According to Tabbers (2002) and Tabbers et al. (2001) a spoken narration is beneficial when learning in a system-paced design, while a written text modality is more advantageous in a self-paced design. Other studies show a general beneficial effect of self-pacing (e.g., Mayer and Chandler 2001;Schwan and Riempp 2004;Tabbers and de Koeijer 2010).
Will we find overall higher learning outcomes and lower cognitive load (mental effort) in the system-paced condition than in the self-paced condition?
• Tabbers et al.'s (2001) findings refer to the modality effect and interactivity level. Dual-Coding Theory (Paivio 1978(Paivio , 1986) underlines usage of visual and verbal cognitive channels when processing information. However, some evidence suggests that people make use of these two channels to different extents and favour only one of them to process information (cf., Mayer 2008;Mayer and Massa 2003).
Will we observe differences in learning outcome as a result of a triple interaction of cognitive style (visual, verbal), type of pacing (self-pacing, system-pacing), and type of visualization (animation, static pictures)?
Learning outcome is measured as sum of points received in the post-test. Consequently, we expected the following effects to be statistically significant: • A main effect of type of pacing (system-paced learning environments will lead to higher learning outcomes than self-paced learning environments). • A triple interaction of cognitive style, type of pacing, and type of visualization indicating that the interaction effect of the treatments (type of pacing and type of visualization) is moderated by cognitive style.

Participants
Biology students (N = 235; 74.5% female), between the age of 18-35 years (M = 21.69; SD = 2.72), from two universities in Germany learnt with a computer-based learning environment.

Learning environments
In a 2 × 2 design, four versions of the learning environment that communicated primary reactions in photosynthesis were developed in order to respond to the research questions: two animated versions (system-paced or self-paced) and two versions with static pictures (system-paced or self-paced). Each version started with an identical short introduction containing a written explanatory text (ca. 80 s.) and lasted 10 min altogether (including the introduction). To control for time-on-task, each student in each condition interacted with the learning environment for 20 min, as both system-paced and selfpaced versions of the learning environment were switched off after 20 min. During this time, participants could either view the entire animation or static pictures version twice at a pre-set speed after the play button was clicked (system-paced condition) or pause, rewind or fast-forward the learning environment and start it again at any point, time, and as often as they wanted (self-paced condition). The necessity of allowing participants 20 min of learning time was deducted from pilot tests with a group of 30 biology students which revealed that a single 10-min exposition of the learning environment did not lead to satisfactory learning effects. Both animation and static pictures were accompanied by a vocal explanatory narration but contained some additional verbal components in form of textual descriptions, such as biological terms or chemical formulas. Each version of the learning environment conveyed the same information. In static pictures movements were depicted by arrows, so that learners had to imagine themselves actual movements of elements (see Fig. 1: Photosystem II). In animation learners could directly see elements moving.

Instruments and measures
Demographic information was assessed with a questionnaire which provided information about participants' age, gender, current enrolled semester, university major and GPA in high school ("Abitur").
In order to measure the visual-verbal cognitive style, the following two questionnaires were used: Fig. 1 Screenshot from the self-paced static picture version of the learning environment with a red arrow indicating motion of an electron within the Photosystem II depicted in green. Note that in the animation the arrow is replaced with a vertical motion of electrons within the Photosystem II, while all other elements of the visualization remain still • Individual Differences Questionnaire, IDQ (Paivio and Harshman 1983). The IDQ questionnaire consists of ten statements to be answered on a 4-point Likert scale. The IDQ questionnaire has 2 scales, the visual scale (Cronbach's alpha α = .80) and the verbal scale (Cronbach's alpha α = .79). • Verbalizer-Visualizer Questionnaire, VVQ (Richardson 1977). The VVQ questionnaire consists of 15 statements to be answered on a 4-point Likert scale. The VVQ questionnaire has 2 scales, the visual scale (Cronbach's alpha α = .65) and the verbal scale (Cronbach's alpha α = .72).
Participants assessed the level of cognitive load they experienced by answering two questions based on a scale developed by Paas (1992). The first question asked about the mental effort invested in learning, while the second one targeted the difficulty of the learning environment. Both questions were on a 9-point-scale (ranging from "very, very easy/very, very low mental effort" to "very, very difficult/very, very high mental effort") with 18 points as the possible maximum on the scale. Cronbach's alpha of the cognitive load scale was α = .72.
The topic-related prior knowledge was measured with three questions: one open question and two multiple choice/closed questions (with just one correct answer). The prior knowledge scale assessed the level of knowledge about photosynthesis among participants before the study (which was expected to be very low). Table 1 contains the items designed to assess prior knowledge on photosynthesis.
In order to measure the level of knowledge about photosynthesis after the learning session, we developed a learning outcome measure. The learning outcome measure consisted of 33 open and closed/multiple choice questions (each with one correct answer) regarding photosynthesis. Cronbach's alpha for the learning outcome scale was α = .80. The possible maximum score on the learning outcome measure was 56 points. Table 2 shows examples of items designed to assess learning outcome on photosynthesis.
Two independent raters rated all open questions assessing prior knowledge and learning outcome. In cases of disagreement, a shared decision was reached by discussion and the help of a third rater.

Procedure
Both the learning environments and the questionnaires and tests were computer-based. First, the participants answered some demographic questions (see above) as well as the IDQ and VVQ questionnaires regarding visual-verbal cognitive style. Subsequently, they had to work on the prior knowledge questions and engaged with the learning environment for 20 min. Participants were randomly assigned to one of the four versions of the learning environment. Finally, they assessed their cognitive load and answered the questions to measure their learning outcome. Each participant was assigned to a single computer system including headset. The whole procedure lasted for about one hour, including 5 min of introduction, ca. 30 min of answering pre-and post-test questions, 20 min of interacting with the learning environment, and 5 for debriefing and possible questions.

Data analysis
Data analysis commenced with calculating a principal component analysis (with oblimin rotation) on the four scales (two verbal and two visual) from both cognitive style questionnaires (IDQ and VVQ). The analysis showed that the four scales loaded on two factors measuring either visual or verbal cognitive style. These two factors were independent from each other (r = − 0.09; p = .194), variance was accounted for 78%. We analysed the data within the framework of the General Linear Model (Horton 1978) with a sequential decomposition of variance and learning outcome as the dependent variable. Analyses were conducted separately for visual cognitive style and verbal cognitive style as covariates. In each analysis, the variance of the dependent variable was decomposed by integrating the predictors in the following sequence into the linear model: (1) the In order to destroy the weeds in his garden, a gardener used the DCMU herbicide. This compound prevents electron transfer to plastoquinone. Consequences are the following: (a) NADPH+H + is still being constructed, the proton gradient is being raised, and ATP synthesis comes to a standstill (b) The water splitting in the Photosystem II stops, NADPH+H + and ATP are still being generated (c) NADPH+H + is not being generated anymore, no more protons are being pumped to the cytochrome b 6 f complex, ATP is not being generated anymore (d) The water splitting in the Photosystem II is still on, the proton gradient is being raised, ATP is being generated What is the first donor of electrons in the lightdependent reactions? Where do the electrons land at the end?
(Open question. 2 points max.) covariate, (2) the treatment factors and their interaction, (3) the two interactions of cognitive style and each of the treatment factors, and (4) the triple interaction of cognitive style and the two treatment factors.

Results
Since the participants were mostly female, we first checked whether there were systematic differences between males and females on the main variables such as visual style, verbal style, learning outcome and cognitive load, as this would have reduced the generalizability of the results. However, there were no significant mean differences (all p > .50). The participants' level of prior knowledge on the topic was very low, as expected (M = 1.07, SD = 1.02, with 10 as the possible maximum score).

Learning outcome
We performed analyses for the dependent measure of learning outcome represented as sum of points received for correct answers (with 56 as the possible maximum score). Analyses were conducted separately for both factors, visual cognitive style and verbal cognitive style, as covariates.
As stated above, we expected a main effect of type of pacing (system-pacing superior to self-pacing) as well as a triple interaction of cognitive style, type of pacing, and type of visualization.

Results for learning outcome with visual cognitive style as covariate
In the analysis with visual cognitive style as the covariate, we found the following effects: • The expected main effect of type of pacing, F(1,227) = 2.76, p (one-tailed, 1-df-test) = .098/2 = .049, η p 2 = .012; M self-paced = 29.32; SD self-paced = 9.17; M system-paced = 31.19; SD system-paced = 9.09. • The expected triple interaction of visual cognitive style, type of pacing, and type of visualization, F(1,227) = 5.08, p = .025, η p 2 = .022, see Fig. 2. Table 3 presents the results of all effects of the analysis, and Fig. 2 displays the significant triple interaction. Figure 2 shows that a higher visual cognitive style is associated with better learning outcomes when learning with system-paced static pictures. Furthermore, a higher visual cognitive style is associated with poorer learning outcomes with self-paced static pictures. For lower visual cognitive style, the opposite is true. Intriguingly, visual cognitive style seems to make no difference when learning with animations. In other words, visual cognitive style correlates negatively with learning outcome when learning with a self-paced learning environment based on static pictures (r = − .28). This correlation decreases when learning both with a self-paced and system-paced animation (r = − .12 and r = − .20, respectively) and changes to a weak positive correlation when learning with a system-paced learning environment based on static pictures (r = .19).
Hence, the results of the analysis not only indicate the expected effect of type of pacing in favour of system-pacing (system-pacing gives a greater learning outcome than self-pacing). The findings also indicate that in the static pictures condition, learning outcome depends on visual cognitive style with a weak negative correlation for the self-pacing condition and a weak positive correlation for the system-pacing condition.
For illustrative purposes, this relation in the static picture condition is displayed in Fig. 3. We performed a median split of visual cognitive style into less developed visualizers (LDV, N = 117, 73.5% female, age: M = 21.65; SD = 2.80 years) and highly developed visualizers (HDV, N = 118, 75.4% female, age: M = 21.73; SD = 2.66 years) and analysed simple effects of the triple interaction for the static picture condition and animation condition.
There were no significant simple effects in the animation condition (all simple maineffects p > .10).

Results for learning outcome with verbal cognitive style as covariate
In the analysis with verbal cognitive style as the covariate we found the expected main effect of type visualization, F(1,227) = 16.97, p < .001, η p 2 = .070: animation (M animation = 32.56; SD animation = 8.33) outperformed static pictures (M pictures = 27.94; SD pictures = 9.41). However, we did not obtain any other significant effects (all p ≥ .10; Table 4), so the second hypothesis (a triple interaction effect) was not confirmed with verbal cognitive style.

Results for cognitive load with visual cognitive style as covariate
As the analysis on learning outcome with visual cognitive style as a covariate yielded several significant effects, we also performed analyses for the dependent measure of cognitive load represented as sum of points received in the cognitive load scale (with 18 points as the possible maximum; the more points, the higher the cognitive load) and visual cognitive style as covariate.
Analysis with visual cognitive style as the covariate generated one significant effect: the triple interaction of visual cognitive style, type of pacing, and type of visualization, F(1,227) = 4.17, p = .042, η p 2 = .018. Table 5 shows the results of all effects of this analysis and Fig. 4 displays the significant triple interaction.
As shown in Fig. 4, a highly developed visual cognitive style is associated with a higher level of cognitive load when learning with self-paced static pictures. At the same time, a highly developed visual cognitive style is associated with a slightly lower level of cognitive load when learning with system-paced static pictures. All correlations of visual cognitive style and cognitive load are weak with values of r = .20 for self-paced static pictures, r = − .12 for system-paced static pictures, r = − .09 for self-paced animation, and r = .11 for system-paced animation, respectively. The analysis indicates that in the static pictures condition, cognitive load depends on visual cognitive style with a weak positive correlation for the self-pacing condition and a weak negative correlation for the system-pacing condition. Interestingly, in the animation condition the opposite is true: Cognitive load depends on visual cognitive style with a weak positive correlation for the system-pacing condition and a weak negative correlation for the self-pacing condition. The difference between correlation indicators in the static pictures condition is significant, p (one-tailed, 1-df-test) = .089/2 = .044, while the difference between correlation indicators in the animation condition did not reach a significant level, p = .288.

Discussion
As a follow-up to the results of Tabbers (2002) and Höffler and Schwartz (2011), the goal of our study was to investigate the role of visual-verbal cognitive style when learning with self-or system-paced animations or static pictures. We designed and applied a long computer-based learning environment similar to an authentic multimedia lesson that addressed a complex topic (primary reactions in photosynthesis) and implemented the modality principle (with spoken narration) in four different versions: system-paced animation, self-paced animation, system-paced static pictures and self-paced static pictures. Firstly, regarding the pacing effect, learners from system-paced groups outperformed learners from self-paced groups on learning outcome. Since the learning environments applied in our study were accompanied by a spoken narration, this result is in line with findings from Tabbers et al. (2001) and confirms that when using a spoken narration, system-paced environments are more beneficial to learners than self-paced ones. Tabbers (2002) explains this effect by arguing that listening to a spoken narration, in contrast to reading a text, is a passive process and therefore more suitable when viewing linear presentations. A written text should evoke a more active and strategic way of learning and hence better fit with the self-pacing mode (Tabbers 2002). Secondly, the results from our study seem to support the assumption that the impact of treatment factors is moderated by visual cognitive style. Instead of a simple interaction of cognitive style and type of visualization, which could have provided us with more evidence regarding usability of animations and static pictures for visualizers and verbalizers (cf., Höffler et al. 2010;Schnotz and Rasch 2005), we found a triple interaction between visual cognitive style, type of pacing, and type of visualization indicating that this relation is more complex. However, quite unexpectedly, the obtained interaction effect showed a negative relation between visual cognitive style and learning outcome when learning with self-paced static pictures (r = − .28). Additionally, more detailed median-split analyses of this effect showed that visual cognitive style was involved only when learning from static pictures (no significant effects for animation) and was related to a decrease in the performance of highly developed visualizers (HDV). When learning with self-paced static pictures, visual cognitive style was also positively correlated with experienced cognitive load (r = .20); that is, the higher the score on the visual cognitive style scale, the higher the experienced cognitive load. At the same time, HDV did not show any superiority to LDV in terms of learning outcome in any condition.
This result partly supports the findings of Höffler and Schwartz (2011) showing the same pattern of decrease in performance when learning with self-paced static pictures. Moreover, the additional information that the mentioned decrease is related to higher scores on the visual cognitive style dimension is a novel finding. Although previous studies indicate that highly developed visualizers perform better when learning with static pictures than less developed visualizers (cf. Höffler et al. 2010;Schnotz and Rasch 2005), we should bear in mind that the study of Höffler et al. (2010) was conducted with written text, while this study used a spoken narration. This might suggest that the advantages of static pictures for highly developed visualizers are related to the modality of the accompanying explanatory text. Our study showed that a combination of spoken narration, static-pictures and self-pacing may impair the effectiveness of learning and increase the experienced cognitive load, but only in relation to higher scores on the visual cognitive style scale.
One might speculate that in the self-paced/static pictures condition, highly developed visualizers might have experienced the learning environment as "too easy" for them. Such an explanation is probable considering that visualizers are used to think in pictures (Mayer and Massa 2003) and hence have more experience in managing pictorial information. Selfpaced static pictures might have provided highly developed visualizers, as kind of experts in pictures, with too much help and guidance that hindered them from processing information on their own thus inducing additional cognitive load (cf., expertise reversal effect, Jiang et al. 2018;Schnotz and Rasch 2005). The obtained positive correlation between visual cognitive style and experienced cognitive load may support this assumption. Further research is required to shed more light on this result.
Another possible explanation would state that combination of narration and self-paced pictures was too cognitively demanding for HDV. According to Riding and Adams (1999), visual learners tend to generate mental images of the written words. Could it be true for spoken words too? Moreover, there are findings suggesting that learners' characteristics, such as cognitive style, may determine the way of mental representing information to greater extent than the way in which the information was conveyed (e.g., Kraemer et al. 2014). Self-paced instruction turned out to be overall more demanding when combined with a spoken narration. Could it mean that the necessity of building mental models from static pictures while generating mental images of words, exceeded cognitive capacities of the visual channel in the group of HDV? Verification of this assumption requires further research.

3
Additionally, we obtained an overall superiority of animation versus static pictures both with visual cognitive style as a covariate and with verbal cognitive style as a covariate. This result is in line with the findings of Höffler and Leutner (2007) and supports the assumption of animations being more beneficial to learners than static pictures (cf., Rieber 1991;Salomon 1979), at least when learning about a process, that is when showing changes of things through time, as it was the case in our study. It is crucial to be cautious when interpreting this effect though, as animations may not lead to better learning outcomes than static pictures in other conditions (cf., Castro-Alonso et al. 2016).

Limitations of the study
In our study, students who were learning with animations displayed overall better learning outcomes than the students who were learning with static pictures. This result must be interpreted with caution though, as firstly it is restricted to a particular topic (in our study the process of photosynthesis) and secondly, the effect might be simply caused by, for example, an inappropriate quality of the arrows in the static pictures condition. We are also aware that our interpretations of the revealed triple interaction of cognitive style, type of pacing, and type of visualization are hypothetical. Some further limitations of our study prevent a more conclusive interpretation. For example, we did not observe the interactive intensity of control button usage. Hasler et al. (2007) claim though that self-pacing is beneficial even if used infrequently. Nevertheless, actual information about the frequency and the manner in which participants took advantage of learner control buttons would have helped to explain our results more adequately. Furthermore, we could have obtained information on whether some learners used the control buttons to review some crucial parts of the learning environment more than two times, thus gaining an advantage over the participants in the self-paced conditions. However, it was not possible to review the entire learning environment more than two times within 20 min, whatever the condition. Therefore, we deem all conditions satisfactorily comparable.
Additionally, we obtained a significant triple interaction with cognitive load as a dependent measure, yet, when interpreting this effect, one must bear in mind that the scale we used is a self-report on cognitive load perceived by participants. We decided to apply this very common scale in order to link our findings to previous research in the field in which the scale was also applied (e.g., Höffler and Schwartz 2011). Still, we are aware that a self-report is a subjective, not an objective measure of cognitive load. Next time, using a more sophisticated measurement like pupillometric data (Klingner et al. 2008) or dual-task measurement (Brünken et al. 2004) might be the better option.

Conclusions and further research
There are considerable controversies regarding advantages and disadvantages of the learning features of different multimedia (such as the level of interactivity and the type of visualization) as well as the role of cognitive style. Our study contributes not only to an understanding of the consequences of combining different types of visualization and pacing in an authentic and complex learning environment with a spoken narration, but also underlines the importance of learners' individual differences for learning outcome. Namely, our study showed that a spoken narration is more suited to a system-paced environment and supports learning with animations. Apparently, the positive effects of system-pacing or dynamic visualizations especially occur when applying ecologically valid environments. However, the combination of spoken narration, self-pacing and static pictures decreases learning outcomes and increases cognitive load in highly developed visualizers. One possible explanation is the expertise reversal effect: Highly developed visualizers experience self-paced static pictures with narration as providing too much external support, which hinders them from constructing their own mental representations of processed information. Another possible explanation states that, as visualizers tend to generate mental images of words, processing pictures and words in the self-paced condition, was too demanding for their visual channel. In order to better understand and explain this result, the interactive intensity of the usage of learner control buttons should be considered. Further research could also apply eye-tracking methods to observe eye movements in groups of higher and lower developed visualizers when learning with self-paced or system-paced dynamic and non-dynamic representations in order to gain a deeper view of participants' attention processes. It would help to come closer to explaining when and why students benefit most from self-paced condition and when/why from system-paced one.