Introduction

Because of the COVID19 pandemic, online learning became increasingly popular and necessary. Consequently, social (in-person) exchanges between students and teachers or instructors (e.g., Gehlbach, 2010) often decreased significantly. However, social exchanges with instructors and more generally, social cues, defined as all digital traces of social interactions (e.g., anthropomorphism, nonverbal communication, the use of voice) in learning environments can have strong benefits and might be crucial considering learning success in complex media-based environments (Schneider et al., 2021). In order to exploit these advantages, online learning environments include pedagogical agents (PAs). Generally, PAs are computer-generated or designed characters which serve instructional purposes in educational settings (Martha & Santoso, 2019; Veletsianos & Russell, 2013). Recent meta-analyses outlined that the implementation of PAs fosters learning outcomes (Castro-Alonso et al., 2021; Schroeder et al., 2013). Furthermore, the authors found that the effectiveness of PAs depends on several boundary conditions. For example, 2D agents are more effective than 3D agents at fostering learning outcomes. In addition, including nonverbal communication, motion, and voice, appeared not to moderate their effectiveness. Nevertheless, several characteristics of PAs are not thoroughly investigated to date. In an extensive review, Martha and Santoso (2019) postulated that research should focus on the emotional and motivational impacts of pedagogical agents on learners. Nonverbal cues, like facial expressions, as well as verbal cues like the expression of humour influence the emotional perception of the agent (Buttussi & Chittaro, 2019; Krämer et al., 2013). These emotional impacts might be further moderated by the similarity between the agent and the learner (Schunk, 1987). Consequently, both factors, the emotional design of a PA as well as the agent-learner similarity are in the focus of this research.

Emotional design

LeDoux and Brown (2017) as well as Pekrun and Stephens (2010) postulated that emotions play an important role during learning since cognitive processing and emotional experience are inseparably connected. The emotional design hypothesis (Park et al., 2015) states that emotional experiences can be conducive (emotion as a facilitator) or obstructive (emotion as a suppressor) for learning. Plass and Kalyuga (2019) supported this assumption and specified that, on the one hand, emotional processing might exhaust cognitive resources and thus, could be seen as learning irrelevant cognitive strain. On the other hand, emotions foster motivation and enhance mental effort investment. These conflicting perspectives on how emotions influence learning-relevant processes suggest that different mechanisms work simultaneously and not all boundary conditions are specified.

Emotions as suppressor

According to the Cognitive Load Theory (CLT; Sweller et al., 2019), emotional processing can be viewed as extraneous cognitive load since resources must be allocated to emotional processing additionally to the processing of the information or learning task. In line with the resource allocation theory of depression (Ellis & Ashbrook, 1988), resources are allocated to processes like appraisal and emotional regulation which do not contribute to the learning goal. Theoretically, results with regard to the seductive detail effect might serve as an additional explanation. Implementing additional interesting but irrelevant information that have to be processed during learning suppress learning (Rey, 2012; Sundararajan & Adesope, 2020) since the induced extraneous load through emotional processing is considered to be more harmful than the benefits through the enhanced interest in the material (Park et al., 2011).

Emotions as facilitator

As outlined by Plass and Kalyuga (2019), emotions, in particular a positive activation (inducing emotions with a positive valence and high arousal) could foster motivation and are associated with enhanced mental effort in learning scenarios. According to the Control-Value Theory of Achievement Emotions (Pekrun, 2000), in particular positive emotions can facilitate the perceived autonomy of learners. Enhanced autonomy is associated with various benefits on motivation and effort investment and recent research outlined the positive effect in learning with multimedia (Schneider, 2021). This assumption is further supported by Stark et al. (2018) outlining that perceived control facilitates learning outcomes.

Empirical findings

Empirical results reflect the conflicting theoretical foundations and heterogeneous research field. Studies support the emotion as suppressor hypothesis by outlining that emotions can hinder attention focusing (Norman, 2004) and reduce deductive reasoning processes (Oaksford et al., 1996). Other studies support the emotion as facilitator hypothesis since positive effects of emotional cues on learning outcomes could be found (Di Leo et al., 2019; Uzun & Yildirim, 2018). For example, verbal cues like enthusiasm of an instructor are positively related to students’ learning (Horan et al., 2012). Further research detected neither positive nor negative effects (Münchow & Bannert, 2019; Navratil & Kühl, 2019). Even if motivation was fostered through emotional cues, learning is not inevitably influenced (Wilson et al., 2018). To shed light in these inconsistent findings, a recent meta-analysis by Wong and Adesope (2021) investigated the effects of positive emotions on learning. Overall, emotional designs enhanced learning outcomes, motivation and increased effort.

Emotional design and pedagogical agents

Emotional design might be particularly important considering pedagogical agents since technical advantages allow designers to create onscreen entities which are suitable for expressing emotions and emotional cues (Krämer et al., 2013). According to the affective animated PA system (Adamo et al., 2021), facial expressions as well as verbal cues within the speech, among others, can transport emotional information. In particular, the voice of the social entity conveys affective cues (for affective processing) in addition to verbal content (for cognitive processing during learning; Nass & Brave, 2005). The positivity hypothesis states that people can recognize whether an instructor is displaying positive or negative emotion even if the instructor is a computer-generated PA (equivalence principle, Horovitz & Mayer, 2021). This was supported by Lawson and Mayer (2021) who found that emotions in the voice were independently rated correctly if PAs were visible or not. Nevertheless, the accuracy of emotional recognition and perceived intensity depends on the animation style of the PA (Meyer, 2021), while emotions of an actual human are detected with a higher accuracy as emotions of an animated PA (Horovitz & Mayer, 2021).

With respect to the emotional response theory (Mottet et al., 2006), instructor communication and student behavior are mediated by the emotional responses of students to instructor messages. Therefore, the question is undoubtedly justified whether PAs who show or perform emotional cues foster learning. Actual research pointed out that the positive affective perception of a pedagogical agent is related to the learner’s engagement with the learning environment (Saariluoma & Jokinen, 2014). In particular, an agent’s expressed enthusiasm (stimulating, energetic and motivating behaviour, Keller et al., 2016; teacher expressiveness or teacher immediacy, Babab, 2007; a specific-seemingly effective-mode of instruction; Shuell, 2001) through verbal and non-verbal cues of a PA induce higher positive emotions in learners, which in turn, enhance affective perceptions, intrinsic motivation, and cognitive outcome (Liew et al., 2017). Even if moderator variables should be considered (e.g., mental load of the learner; Beege et al., 2020), enthusiasm can be viewed as an emotional cue in learning environments since a strong relationship between verbal cues of instructional communication and emotional responses in learners could be shown (Horan et al., 2012). Enthusiasm in the voice can engage learners during video lectures (Guo et al., 2014) and emotional cues within the voice of agents are perceived as more affable that agents with a computer-generated voice (Kim et al., 2003).

This is further supported considering positive emotions in learning with actual humans. A positive instructor is rated as more likely to facilitate learning, more credible, and more engaging. (Lawson et al., 2021b) Furthermore, learners reported paying more attention during learning and scored higher in a delayed post-test when learning with a positive instructor (Lawson et al., 2021a, 2021b). Teacher enthusiasm inspires students (e.g., Keller et al., 2014) and promotes student´s mathematics achievement (e.g., Kunter et al., 2013). Teacher enthusiasm can lead to higher academic achievement as well as lower levels of off-task behavior (Brigham et al., 1992). It is further associated with enhanced intrinsic motivation and vitality (Patrick et al., 2000). Finally, teacher enthusiasm is a central mediator considering the positive effect of teacher enjoyment on student enjoyment (Frenzel et al., 2009).

Summarizing, Affective Pedagogical Agents (agents with emotional capabilities, in particular PAs expressing enthusiastic cues) and their affective stimulated social interaction with the learner distinguish affective PAs from traditional computer-based tutoring, seemingly offer a unique instructional impact (Kim et al., 2003).

Nevertheless, Schroeder and Gotch (2015) outlined that the voice of agents should still be investigated since some studies failed to isolate effects that result only from the manipulation of the agent’s emotional cues in the voice. Moderator variables seem to further determine the effectiveness of affective PAs. The current investigation will further explore these moderator variables, considering an additional design principle which is investigated to date: The gender of the PA and the resulting learner-agent similarity. There are numerous possibilities to increase the similarity between PAs and learners. For example, there are studies regarding age (Beege et al., 2017), or cultural background (Baylor & Kim, 2004). Nevertheless, research with regard to gender is most yielding (e.g., Ozogul et al., 2013) and consequently, gender was chosen to investigate similarity. Thus, the central research question is: Are emotional cues, shown or performed by a PA who is perceived to be similar to the learner (matching gender) more effective for emotional activation of the learning and in consequence, beneficial for learning processes in contrast to a non-similar lecturer?

Gender, model-observer similarity, and similarity-attraction

According to the model-observer similarity hypothesis, the effectiveness of modelling depends in part on the degree to which observers perceive a model to be similar to them (Schunk, 1987). A familiar hypothesis is discussed considering pedagogical agents: the similarity-attraction hypothesis. Learners are attracted to real humans and digital entities who match their personality and other human characteristics. (Byrne & Nelson, 1965). A learner might be more attracted and pay more attention to an instructor when the instructor is perceived as similar (Berscheid & Walster, 1969) since modelling evokes an observer to engage in social comparison (Berger, 1977). Studies with regard to human instructors revealed that for example, black students persist longer in STEM education when taught by black instructor (Price, 2010). Results with regard to gender are less consistent. Some studies found that model-observer similarity with regard to gender did not influence learning. Female learners did not profit from a female lecturer in STEM education (Krämer et al., 2016). Hoogerheide et al. (2016) pointed out that model-observer similarity influences evaluation of the instruction, but not learning success. Furthermore, studies showed that gender in general did not seem to matter with regard to instructional effectiveness (Hoogerheide et al., 2018; Robb & Robb, 1999) or rated teacher effectiveness (Tran & Do, 2020). Other researchers pointed out that students were less likely to persist in STEM courses when taught by female instructors (Price, 2010) and that male instructors were preferred over female instructors (Clayson, 2020). Clayson (2020) has, however, emphasized an important point. In his sample, ratings of instructors did not differ significantly by the students’ own gender and academic major. Only male students ranked politically conservative instructors higher than female students. Consequently, it can be assumed that instructor gender role is more important than instructor gender in affecting students’ evaluations. Freeman (1994) found that female or male characteristics (e.g., affectionate and loving vs. assertive and forceful; compassionate vs. independent) of a lecturer is important regardless of the actual gender of the instructor. According to the identity-based motivation theory (Karabenick & Urdan, 2014), instructors should be compatible with the gender identity of the learner. Identity incongruence is accompanied by negative consequences. If a female student has evidence that a STEM course is not made for “people like [her]”, it becomes less likely for her to pursue a future in STEM. Considering gender identity, female instructor narrows the gender gap in terms of engagement and interest. Furthermore, both female and male students tend to respond to instructor gender (Solanki & Xu, 2018). According to STEM education, female instructors can serve as role models and communicate the idea that there is a place for females in STEM. For example, a woman’s mathematics anxiety and anxiety-about-the-specific-class were related to their endorsement of the stereotype. Continued interaction with a female instructor as a role model, however, reduced this deficit for women in a mathematics course by the end of a semester (Kapitanoff & Pandey, 2017). In line, Bailey et al. (2020) found that female final course grades were as much as .2 standard deviations higher in classes with a female instructor and/or a female student majority.

Results with regard to PAs are comparatively ambiguous. Baylor and Kim (2004) found that a male PA led to a higher reported self-efficacy and self-regulation in contrast to a female. In an additional study, the authors extended these results by pointing out that male agents were perceived as more extraverted and agreeable than the female agents (Baylor & Kim, 2003). Additionally, learners receiving the male agent were more satisfied with their performance. However, Schroeder and Adesope (2015) did not find effects considering the agent’s gender. Studies that explicitly investigated agent-learner similarity reported that female students reported higher program ratings when the PA matched their gender. Male students, reported higher program ratings than females when the PA did not match their gender (Ozogul et al., 2013). When learners had the opportunity to select a PA for the learning task, both male and female students tended to select a same-gender (Ozogul et al., 2013). Furthermore, Moreno and Flowerday (2006), as well as Linek et al. (2010) could not found evidence for the similarity-attraction hypothesis. Kim et al.’s study design (2007) resembles the design of the current experiment and thus, has to be discussed in detail. In a 3 × 2 design the authors investigated the emotional expression of the PA (positive vs. neutral vs. negative) and the gender of the PA (male vs. female). An emotional positive PA had a positive impact on learners’ interest and self-efficacy. An interaction effect revealed that a positive male agent was most effective for motivation enhancement. Furthermore, a male agent increased retention performance in contrast to a female agent. However, the authors did not consider the gender of the learner in their experimental design.

Hypotheses

Summarizing, there is still a need for research considering the emotional design of a PA in dependence on further design characteristics. The current study, therefore, aimed at investigating the more general emotional design principle in dependence on individual learner characteristics like gender of the participants. The central research question is: Is PA enthusiasm more successful when the pedagogical agent is perceived as similar to the learner? First, according to recent meta analyses (e.g., Wang et al., 2021) and studies with regard to affective PAs (Liew et al., 2017), it was assumed that an enthusiastic pedagogical agent will enhance learners’ emotional activation and, in consequence, learning processes.

H1

Learners receiving an enthusiastic PA report a higher positive activation than learners receiving a neutral PA.

H2

Learners receiving an enthusiastic PA achieve higher learning outcomes than learners receiving a neutral PA.

With regard to agent-learner similarity, it is assumed that the affective impact of an PA on learning processes is stronger when the PA perceived to be similar to the learner (matching gender) since the effectiveness of modelling depends in part on the degree to which observers perceive a model to be similar to them (Model-observer-similarity hypothesis; Schunk, 1987). It has to be noted that results with regard to human lecturers as well as pedagogical agents are ambiguous and no general theoretical or practical implications can be drawn. Thus, two exploratory hypotheses were formulated.

H3

An enthusiastic PA enhances positive activation, in particular, when genders of the agent and the learner match.

H4

An enthusiastic PA enhances learning outcomes, in particular, when genders of the agent and the learner match.

Since previous investigations outlined a potential effect of enthusiastic cues on cognitive load (Ellis & Ashbrook, 1988; Sundararajan & Adesope, 2020), cognitive load was explored as additional variable in order to get insights in the learning process. Enthusiastic cues might induce extraneous load but agent-learner similarity might reduce the perception of irrelevant load since learner might be more attracted and pay more attention to the PA (Berscheid & Walster, 1969) and gets less distracted. Since no evidence for a concrete hypothesis could be found, a research question was formulated.

RQ:

Are perceptions of cognitive load influenced by an enthusiastic PA in dependence on model-observer similarity considering gender?

Methods

Participants and design

Overall, the discussed studies regarding emotional design in multimedia learning showed diverse effect sizes. According to a meta-analysis (Brom et al., 2018), small to medium effect sizes of the positive emotions on learning were postulated. Nevertheless, there are still studies that show medium to large effect sizes (e.g., Schneider et al., 2016). Thus, the current study was powered for a medium effect size. According to an a-priori power analysis (f = .25 α = .05; 1 − ß = .80; 2 × 2 design), 128 participants should be acquired. A total of 129 undergraduate students (67.4% female; age: M = 22.92; SD = 3.14; ethnicity: Caucasian) from the Chemnitz University of Technology (71.3%) and Freiburg University of Education (28.7%) participated in this experiment. Participants were university students (97.4%) or non-student employees (2.6%). Students were enrolled in the majors: media and communication studies (34.6%), learning studies (32.3%), teacher education (19.7%), and others (e.g., mechanical engineering; 13.4%). Each participant received either 5€ (28.7%) or a one-hour course credit (71.3%). Students were recruited via mail. After reviewing the mail, students were able to register online via the registration tool in a time slot for participation. Every student was allowed to participate, there were no restrictions.

The experiment was carried out as a 2 × 2 design. Each student was randomly assigned to one cell of a between-subjects design by drawing lots (enthusiasm of the PA: enthusiastic vs. neutral and gender or the PA: male vs. female). Additionally, a third quasi-experimental factor was considered (gender of the participant: male vs. female). Thirty-one students were assigned to the condition with the enthusiastic and male PA (gender of the participants: eight male, 23 female), 36 students were assigned to the condition with the neutral and male PA (gender of the participants: 14 male, 22 female), 32 students were assigned to the condition with the enthusiastic and female PA (gender of the participants: 14 male, 18 female), and 30 students were assigned to the condition with the neutral and female PA (gender of the participants: six male, 24 female). The low number of male students is due to the distribution of students in the study programs and have to be discussed when interpreting the results.

No significant differences with regard to the between subject factors existed in terms of age, prior knowledge, F(2, 124) = [.03; .53]; p = [.59; .97] and university or subject of study χ2 = [1.49; 7.77]; p = [.051; .48].

Materials

The learning material consisted of four educational videos with a duration of approximately nine minutes. PAs delivered an oral presentation on facts about the human respiratory system. This content was chosen because prior knowledge is generally considered to be low among most populations of students. Additional pictures were inserted to illustrate the discussed anatomical and functional information. A screen example of the video is displayed in Fig. 1.

Fig. 1
figure 1

Screen example of the instructional video

Animated pedagogical agents

Pedagogical Agents were animated using Adobe Character Animator Version 3.2 (Adobe Inc., 2020). Agents were designed as student-aged, comic-like entities with purple hair and pants, and green shoes and shirt. Animation process was carried out through web-cam tracking. The designer of the agents sat down in front of the webcam and delivered the same oral presentation for the videos. In addition to the voice record, Adobe Character Animator tracked the movement of the eyes, eyebrows, lips, and head and transferred it to the PA. The rest of the body of the PA did not move during the video. Gender was manipulated by slight changes of the hair. Whereas the male PA had short hair, the female agent had two additional long strands framing the face. Additionally, in contrast to the male agent, the female PA had four visible eyelashes (see Fig. 2). Furthermore, the speaker was selected with respect to the gender. The video with the male agent was recorded by a male voice, and the video with the female agent was recorded by a female voice. In order to manipulate the enthusiasm, the mouth of the agent was manipulated visually. In the enthusiastic condition, the agent smiled throughout whereas in the neutral condition, the mouth was a neutrally horizontal. Furthermore, enthusiasm was manipulated by the use of voice. In line with Liew et al. (2017), in the high enthusiastic condition, the emotional tone of the voice was enthusiastic, a large dynamic pitch variation and a high pitch contour were used. Further, the agent smiled during the lecture to increase positive activation through visual cues. In the neutral condition, a calm and pleasant voice, a low pitch level and small pitch variations were used. Further, a neutral facial expression (neutral horizontal line) was used. Since the text had to be recorded four times (male enthusiastic, male neutral, female enthusiastic, female neutral), the length of the speech was controlled in the post-production in order to avoid confounding with the learning time or the processing speed. An example of all visual manipulations is displayed in Fig. 2.

Fig. 2
figure 2

Pedagogical agents (from left to right: enthusiastic male, neutral male, enthusiastic female, neutral female)

Pre-study

A pre-study was conducted to check if the manipulation was suitable for our investigation. The aim was to check whether the positive affect was recognized by the learner and, just as important, that gender of the PA had no main effect on emotional variables to ensure that the different speakers (male vs. female) did not already have an influence on the affective processing. Short video clips in the design of the original experimental videos (about one minute) were presented in a 2 × 2 between-subjects design (identical to the main experiment). Thirty-four participants had to rate the pedagogical agents in terms of emotional positivity, perceived enthusiasm, emotional negativity, boredom, sympathy and anthropomorphism. Gender of the learner was included as covariate in all analyses. Results revealed large main effects for the experimental factor enthusiasm. Enthusiastic PAs were perceived as more positive, F(1, 29) = 35.42; p < .001; ηp2 = .55, more enthusiastic, F(1, 29) = 92.92; p < .001; ηp2 = .76, less negative, F(1, 29) = 28.82; p < .001; ηp2 = .50, less bored, F(1, 29) = 76.81; p < .001; ηp2 = .73, but no difference with regard to anthropomorphism was found, F(1, 29) = 1.46; p = .24; ηp2 = .05. No effects with regard to the between subject factor gender of the PA were found, F(1, 29) = [1.11; 3.41]; p = [.08; .30]. Consequently, the design of the PA had an influence of emotional perception. In contrast, no emotional confounding could be found with regard to gender of the PA.

Measures

Emotional processes

Emotional responses of the participants were measured with the PANAVA-KS (Schallberger, 2005). This scale was administered on a 5-point scale. In this questionnaire, students had to rate twelve items on how they feel at the moment with scales of antonymous adjectives. Four items measured positive activation (before learning: α = .71, after learning: α = .85; e.g., students had to assess their positive activation on a scale ranging from “bored” to “excited”), four items measured negative activation (before learning: α = .75, after learning: α = .79; e.g., students had to assess their negative activation on a scale ranging from “peaceful” to “upset) and two items measured valence (before learning: α = .67, after learning: α = .81; e.g., students had to assess their valence on a scale ranging from “dissatisfied” to “satisfied”). The PANAVA-KS was implemented before video reception (baseline measurement) and after video reception. For statistical analysis, scores before video reception were subtracted from the scores after reception to gain data with regard to emotional change through the instructional videos and to control for the baseline measure.

Cognitive load

Cognitive load was measured using the self-reported scale from Klepsch et al. (2017), which was chosen because the questionnaire refers to the complexity of the content and the recognition of important information. Two items measured intrinsic load (ICL; α = .84; e.g., “This task was very complex.”). Three items measured extraneous load (ECL; α = .83; e.g., “During this task, it was exhausting to find the important information.”). Two items measured germane processes (GCL; α = .45; e.g., “My point while dealing with the task was to understand everything correct.”). The participants had to rate the items on a 7-point Likert scale ranging from 1 (absolutely wrong) to 7 (absolutely correct). The low reliability of the GCL measure has to be considered when discussing the results.

Knowledge measures

Three types of knowledge measures were integrated: prior knowledge, retention, and transfer. Whereas retention can be defined as remembering or reproducing information which were presented in the instructional videos and transfer refers to applying the knowledge in order to solve novel problems, which were not explicitly presented in the learning material (Mayer, 2014).

First, prior knowledge was measured with five open answer questions about the respiratory system (blood-air barrier, lung, respiratory process, bronchi, structure of the trachea; α = .72). Students were asked to write as many facts as they know. They gained a point for each information which was part of the learning material. Students were told to write “I don’t know” if they did not know any information. The inter-rater reliability of two pre-trained reviewers was at least moderate, ICC (1, k) = [.77, .93], F(128, 128) = [7.60, 28.59], p < .001 or perfect (Koo & Li, 2016). Overall, prior knowledge was low (mean points: M = 2.28; SD = 1.79).

Second, retention was measured with a 15-item questionnaire (α = .78). Thirteen multiple-choice items were included. Students were asked to choose between up to six possible answers; one up to all answer options could be correct. Students received points for selecting correct answers and for not selecting incorrect answers. Furthermore, two open answer questions were included. The inter-rater reliability was high ICC (1, k) = [.57, .85], F(128, 128) = [3.64, 14.10], p < .001. Retention questions covered information that was explicitly presented within the learning material. Students were able to get up to 66 points.

Third, transfer was measured with a 5-item scale with open-answer questions in which every item presented a new scenario (α = .80). Every scenario could be solved with the knowledge the students had obtained from the learning material. The inter-rater reliability was moderate to high ICC (1, k) = [.74, .94], F(128, 128) = [7.30, 32.74], p < .001. Students could reach up to 13 points.

Procedure

The investigation took place online. Up to four students logged in to the educational online platform BigBlueButton. Each student was assigned to breakout room and received a link for study participation. Students were instructed to share their screen (allowing the experimenter to see the display in real-time) till the end of video reception to control participation. Afterwards, screen sharing was ended to not put pressure on the subjects during the completion of the tests. After receiving the link, the investigation started.

First, the participants were informed that the experiment was a video study on the respiratory system; they were then asked to answer the prior-knowledge test. Afterwards the first PANAVA-KS had to be answered. Next, they were given the video link and were asked to watch the educational video once. The dependent variables were measured after the video presentation in the following order: second PANAVA-KS, cognitive load, retention, transfer. Finally, demographic questions were asked. Since gender of the participants played a major role, the question with regard to the gender measure is outlined: “With which gender do you identify? (Male/Female/Diverse)”. If all tests were completed, then the participants could leave the online-platform. Altogether, the experiment lasted from an average of 40 min.

Analysis strategy

In the analyses of data, multivariate analyses of variance (MANOVAs) and univariate analyses of variance (ANOVAs) were conducted in order to assess differences between groups. For all variance analyses, enthusiasm (enthusiastic vs. neutral), gender of the PA (female vs. male) and gender of the participant (female vs. male) were used as independent variables. In particular, the three-way interactions were relevant for answering Hypothesis 3 and 4. Consequently, graphical analyses were presented in addition to the ANOVAs. Since no other variable (i.e., age, gender, grade, prior knowledge) significantly differed among the experimental groups, no covariate was used for the analyses. Pre-defined test assumptions were only reported if significant violations occur. Descriptive results for all dependent variables are outlined in Table 1.

Table 1 Means and standard deviations of all dependent variables for the experimental groups

Results

Emotional processes

A MANOVA was conducted with positive activation, negative activation and valence as dependent variables. A significant main effect was found for enthusiasm, Wilk’s Λ = .86; F(3, 119) = 6.32, p < .001, ηp2 = .14 and for gender of the learner, Wilk’s Λ = .92; F(3, 119) = 3.49, p = .02, ηp2 = .08. The other main effects and two-way interactions did not reach significance (p > .05). There was a significant three-way interaction, Wilk’s Λ = .87; F(3, 119) = 6.19, p < .001, ηp2 = .14.

Follow up ANOVAs with respect to the main effect regarding enthusiasm revealed a significant effect for positive activation, F(1, 121) = 11.20, p = .001, ηp2 = .09 and for valence, F(1, 121) = 6.47, p = .01, ηp2 = .05. An enthusiastic agent enhanced positive activation and perceived valence. With respect to negative activation, Welch-corrected ANOVAs were conducted since the Levene-test reached significance (p = .01). There were no significant effects for negative activation (p > .05). With regard to the three-way interaction, significant effects for positive activation, F(1, 121) = 16.27, p < .001, ηp2 = .12, and valence was found, F(1, 121) = 9.13, p = .003, ηp2 = .07. With respect to negative activation, no interaction effects could be conducted because of the violated test-assumption. Graphical analyses (see Fig. 3) revealed that an enthusiastic agent only fostered positive activation and perceived valence when genders matched. Descriptively, negative activation was reduced when genders matched.

Fig. 3
figure 3

Bar diagrams for all experimental conditions with respect to emotional processes, Ms and SDs of difference scores (scores after video reception—scores before video reception) are displayed

Cognitive load

A MANOVA was conducted with ICL, ECL, and GCL as dependent variables. A significant main effect was found for enthusiasm, Wilk’s Λ = .79; F(3, 119) = 4.59, p < .001, ηp2 = .22 and a significant interaction between the gender of the agent and the gender of the learner, Wilk’s Λ = .87; F(3, 119) = 5.87, p < .001, ηp2 = .13. The other main effects and two-way interactions did not reach significance (p > .05). There was a significant three-way interaction, Wilk’s Λ = .87; F(3, 119) = 5.78, p = .001, ηp2 = .13.

For ECL, a Welch-corrected ANOVA was conducted since Levene-test reached significance (p = .01). Follow up ANOVAs with respect to the main effect regarding enthusiasm revealed a significant effect for ECL, F(1, 127) = 16.10, p < .001, ηp2 = .12 and GCL, F(1, 121) = 15.89, p < .001, ηp2 = .12. An enthusiastic agent reduced ECL and enhanced GCL. There was no significant effect for ICL (p > .05). With regard to the interaction between the gender of the agent and the gender of the learner, a follow up ANOVA revealed a significant effect for GCL, F(1, 121) = 5.84, p = .02, ηp2 = .05. GCL was enhanced when genders matched, in particular with respect to male participants. With regard to the three-way interaction, a significant effect for ICL, F(1, 121) = 17.30, p < .001, ηp2 = .13 but not for ECL or GCL was found (p > .05). Graphical analyses (see Fig. 4) revealed that an enthusiastic agent reduced ICL when genders matched and a neutral agent enhanced ICL when genders matched. Descriptively, an enthusiastic agent reduced ECL and enhanced GCL when genders matched, but results did not reach significance.

Fig. 4
figure 4

Bar diagrams for all experimental conditions with respect to cognitive load, Ms and SDs of ICL (intrinsic cognitive load), ECL (extraneous cognitive load), GCL (germane cognitive load) are displayed

Learning outcomes

A MANOVA was conducted with retention and transfer as dependent variables. A significant main effect was found for enthusiasm, Wilk’s Λ = .79; F(2, 119) = 15.74, p < .001, ηp2 = .21 and gender of the PA, Wilk’s Λ = .94; F(2, 119) = 3.68, p = .03, ηp2 = .06. The other main effects, the interaction between enthusiasm and the gender of the PA and the three-way interaction did not reach significance (p > .05). There was a significant interaction between enthusiasm and the gender of the learner, Wilk’s Λ = .87; F(2, 119) = 9.27, p < .001, ηp2 = .14, and a significant interaction between the gender of the PA and the gender of the learner, Wilk’s Λ = .75; F(2, 119) = 19.91, p < .001, ηp2 = .25.

Follow up ANOVAs with respect to the main effect regarding enthusiasm revealed a significant effect for retention, F(1, 120) = 3.45, p < .001, ηp2 = .20. An enthusiastic agent enhanced retention performance. There were no significant effects for transfer (p > .05). Furthermore, a male agent enhanced retention performance, F(1, 120) = 5.09, p = .03, ηp2 = .04, but not transfer performance (p > .05). With respect to the interaction between enthusiasm and the gender of the learner, a significant effect was found for retention, F(1, 120) = 18.37, p < .001, ηp2 = .13, but nor for transfer (p > .05). With respect to the interaction regarding the two genders, a significant effect was found for retention, F(1, 120) = 39.93, p < .001, ηp2 = .25, but only marginally for transfer (p = .054). Graphical analyses (see Fig. 5) revealed that a retention was fostered when genders matched. Furthermore, in particular, male participants benefited from an enthusiastic or/and a male agent. Female learners did not generally benefit from an enthusiastic agent without considering the gender of the PA. Only when genders matched, an enthusiastic agent had benefits on retention performance. Descriptively, matching genders fostered transfer performance. With respect to male participants, an enthusiastic agent fostered transfer as well, but results with regard to transfer performance did not reach significance.

Fig. 5
figure 5

Bar diagrams for all experimental conditions with respect to learning outcomes, Ms and SDs of retention and transfer are displayed

Discussion

The current study investigated the effect of an affective PA in dependence on model-observer similarity with respect to gender. H1 could be supported and H2 could be supported partially since an enthusiastic PA enhanced positive activation and retention performance. However, transfer performance was not influenced through an enthusiastic PA. Three-way interactions revealed that H3 could be partially supported as well. Matching genders enhanced positive activation and reduced ICL when learning with a positive PA. Results with regard to learning outcomes were less clear. Whereas transfer performance was not influenced by the experimental manipulation, interaction effects with regard to retention have to be discussed. Male participants generally benefited from an enthusiastic PA, whereas female learners only benefitted from a female enthusiastic PA. Overall, matching genders in combination with an enthusiastic PA led to the highest retention scores. In turn, H4 can be supported partially. With regard to the research question considering cognitive load, it was found that an enthusiastic PA reduced ECL and enhanced GCL. Furthermore, an enthusiastic agent and matching genders led to a reduced perception of ICL.

Results supported the positivity and voice hypothesis (Lawson & Mayer, 2021; Nass & Brave, 2005). Facial expressions as well as verbal cues within the speech transported emotional information which led to positive activation or the learner. Positive activation led to a cognitive benefit which were reflected in reduced ECL. In line with the emotions as facilitator hypothesis (Plass & Kalyuga, 2019), it could be shown that resource allocation was influenced through an enthusiastic PA. Participants invested mental effort in the learning task and, in turn, resources were allocated to schema construction which was reflected in an enhanced GCL and retention score. Transfer scores were generally not influenced by the experimental factors. Consequently, manipulating enthusiasm within a short video clip seemed to rather influence basal learning processes.

The experiment further outlined that model-observer similarity or similarity-attraction can play an important role considering social learning scenarios. Prior research could not detect effects of matching gender between instructor and learner on learning (e.g., Hoogerheide et al., 2016, 2018), but studies did not explicitly focus on model-observer similarity with regard to emotional activation. Model-observer similarity might be in particular relevant for induction of affective processes. Learners would rather learn with a same-gender instructor or agent (Ozogulet al., 2013) and, in consequence, affective cues have a greater impact on positive activation and learning processes. This was reflected in enhanced positive activation and perceived valence. Learners further perceived the instructional material as less complex when being positively activated through a same-gender agent. Consequently, learning was fostered. Whereas this model-observer similarity effect was rather small with respect to male participants, in particular female learners benefitted from learning with an enthusiastic, same-gender agent when dealing with a STEM topic (Karabenick & Urdan, 2014).

Implications

On theoretical side, positive emotional cues (like enthusiasm) are promising design factors and should be under further investigation. Even if general benefits of positively activating a learner could be exposed, moderating factors can further strengthen these beneficial effects. Within this study, a social design was shown to have impacts on emotional processing and consequently, learning. In this vein, model-observer similarity or instructor/PA gender should remain in the focus of research. In particular in the light of recent theoretical frameworks (e.g., the Cognitive-Affective-Social Theory of Learning in digital Environments; Schneider et al., 2021), it is assumed that social impacts in multimedia learning must inevitably be considered and thus, can interact with emotional design factors.

On practical side, designers should be aware of the beneficial effects of enthusiastic PAs. PAs should not only be viewed as instructional toll that deliver information, they can have multiple functions, for example activating the learner emotionally. Furthermore, PAs have various design options because they can be animated freely and easily with recent software. Designers should be aware that supposingly “banal” design decisions like the gender of the agent can have an influence on leaning processes. Thus, educators can easily adapt their PAs to the target group of their digital learning tool. Since short-videos were investigated, this might be in particular relevant for educators how provide compact, short instructional videos on platforms like YouTube.

Limitations and future directions

Two limitations had to be outlined which should encourage researcher to further investigate the role of enthusiastic PAs and model-observer similarity in digital education. First, gender of the learner was unevenly distributed. Way more female than male students participated in this experiment. This was because of the nature and general gender distribution in social science study courses. Consequently, analyses should be viewed with caution and gave a first exploratory view in the influence of emotional cues in dependence of model-observer similarity. Furthermore, external validity was limited. Different samples might respond differently on enthusiastic cues with regard to artificial instructors.

Second, videos were short and explicitly produced for the experiment to ensure internal validity. Results might differ with regard to actual video-courses which are much longer. Students might get used to the affective intonation in the voice or the look of the PA. Consequently, effects might get smaller the longer the instructional intervention gets but this is just an exploratory suggestion for a future research hypothesis. Additionally, activating a learner positively might not be beneficial for learning in general. Beege et al. (2020) outlined that emotions have to be processed additionally to the learning content and thus, learners should not already be strongly challenged by the learning task. In this vein, different learning scenarios or fields of study should be in the focus of further investigations.