Keywords

1 Introduction

New education paradigm, as typified by the flipped classroom model, has changed the focus of education — from offering students knowledge to developing students’ experience, from the teacher as a transmitter of knowledge to the teacher as a facilitator of learning. In such education process, students are often required to learn basic knowledge and skills by themselves before the classroom activity. And, the efficacy of the classroom activity depends on the initial knowledge students have. The development of effective self-education tools, therefore, becomes an important issue in modern education.

Among various computer-assisted self-education environments, a Pedagogical Conversational Agent (PCA) based environment suggests a distinct direction, introducing social relations between human and computer by utilizing human-likely embodied features [15, 18]. The previous studies in the field have confirmed preferable effects of PCA on students’ understanding and motivation, compared to text-based/audio-based instruction [3, 13, 21]. Furthermore, to better design effective PCAs, a variety of findings in human-human communication were adopted. For example, the roles of PCA as a learning partner, and the impacts of stereotyped appearances, facial expression, audio expressions, and posture has been examined so far [4, 11, 12, 23]. Likewise, the impact of gesture are naturally of great concern [18, 24, 28].

In general conversation, it is confirmed that speakers’ gestures help listeners to comprehend their speech content [9, 16, 17]. Similarly, not only recent studies in human-human educational interaction have also pointed out that gestures in scientific explanation play an important role on students’ understanding and collaborative learning [2, 25], but also some empirical studies have confirmed the learning efficacy of the use of gestures in education [10, 26, 27].

However, little studies in the field of PCA research addressed the impact of gesture and the types of gesture. Buisine et al. reported that when PCA used redundant gestures, which convey the same information included in audio instruction, the performance of verbal recall test increased, compared to when PCA used complementary gestures, which convey new information not included in audio interaction [6, 7]. Also, Buisine et al. showed that the redundant use of gestures increased the perception of quality of explanation, and likability and expressiveness of PCA.

Buisine et al. examined the redundancy and complementarity of the gesture use, but they mainly employed pointing gestures which refer a certain part of a learning material and gestures which present pictorial features of concrete objects. A gesture can convey abstract meanings as well. This aspect of gesture has not addressed yet, despite the importance of conveying abstract concepts in educational instruction.

To examine the impact of gesture which presents abstract meanings in PCA based learning system, in this paper, we conduct an on-line experiment in which we evaluate learners’ ability to complete a vocabulary recall test, a figure selection test, and answers to a questionnaire, when the versions of the system use gestures matched with abstract speech content, gestures mismatched with speech content, or no gesture at all. The design allows us to examine the impact of the gesture on memorization of technical terms, understanding of the abstract concepts, learning experience, and perception of the PCA.

In the following section, we will explain the specific type of gesture we aim to examine in this paper.

2 Conduit Metaphoric Gesture

Gestures, usually movements of the hands and arms, which spontaneously occur accompanying with speech are classified into four types, iconic, metaphoric, beat, deictic [19].

Iconic gestures present images of concrete objects or actions, and deictic gestures are used to indicate objects around the speaker. Iconic and deictic may be called gesture of concrete. Beats are the movements with the rhythmical pulsations of speech which index the accompanied word of phrase as being significant.

Contrary to those three types, in metaphoric gestures, abstract meaning is presented as if it had form by utilizing space. For example, a speaker appears to be holding a object, meaning an abstract object, such as idea, memory, and etc. This is most often seen metaphoric gesture and it is a gestural version of the ‘conduit’ metaphor. The appearance of conduit metaphoric gestures are typically shown as a cup-shaped hand or hands being holding something, where the hand/hands represent as a container and word/phrase as a substance to be transfered to the listeners [8, 20]. In addition, the conduit metaphoric gestures are often used to show the relationship between several concepts by being put in space.

We will examine the impact of the metaphoric gesture performed by a PCA. As the first step of our investigation, we will focus on the three types of conduit metaphoric gestures and its combinations utilizing space as described below.

  • Conduit Gesture: The appearance of holding an object with the both hands, indicating a single abstract concept.

  • Two-conduit Gesture: The movements of the object-holding hands from the right side of the speaker’s upper body to the left side, synchronized with the word/phrase it accompanies in speech, indicating a schematic relationship between two abstract concepts by utilizing space.

  • Three-conduit Gesture: The movements of the object-holding hands from the right side of the speaker’s upper body to the center, then, the left side, synchronized with the word/phrase it accompanies in speech, indicating a schematic relationship between three abstract concepts.

3 Method

In this section, we will describe our PCA based on-line learning environment, learning material, and the details of the study design.

Fig. 1.
figure 1figure 1

System overview

3.1 E-Learning System and Materials

We developed a web-based learning environment where a PCA gives a speech instruction accompanying with gestures. The animation is processed and depicted on a web browser by using a program written in HTML5, JavaScript and WebGL (a graphic library written in JavaScript) so that participants can take part in the experiment without any special preparation of their computer. Figure 1 shows a system overview. The system processes text inputs with gesture annotation sentence by sentence. The sentence is divided into phrases (in Japanese, a phrase is consist of a content word and function words). The phrases are annotated by using the following schema: CC (a conduit gesture appears holding something with both hands in front of the speaker), CL (a conduit gesture appears holding something with both hands at the left side of the speaker), CR (a conduit gesture appears holding something with both hands at the right side of the speaker). The annotation is interpreted on a web-browser, then gesture animations are realized, synchronizing with audio. In order to synchronize gesture and speech, we estimated the timing of each gesture by indexing the order of the first character of each phrase from the top of the sentence, given that the synthesized audio’s speed is constant. Lip-sync animations are also realized, synchronizing with audio.

We prepared an instruction script (4-minutes-long speech when audio synthesized) to be performed. The learning subject was the basics of web application development. The script explains the basic concepts of web application, the components of web application, and the programming languages for web application developments. A part of the instruction performed by the system is shown in Fig. 2.

Fig. 2.
figure 2figure 2

Metaphoric gestures representing the relationship between two abstract concepts ‘server’ and ‘user.’ ([]: stroke and hold)

3.2 Independent Variables

There is one independent variable, gesture, in the study, and we have three treatment conditions described as below.

  • C1. Speech-gesture Match: Two of authors who were familiar with web application development annotated the script with the schema: CC, CL, CR. The realized instruction included twenty-seven conduit gestures, twelve two-conduit gestures, and three three-conduit gestures. And overall, the number of gesture strokes was sixty.

  • C2. Speech-gesture Mismatch: Based on the annotation data described above, we randomly changed the tags to one of others, resulting the realized animation included forty two conduit gestures, fifteen two-conduit gestures, and five three-conduit gestures. The number of strokes was eighty nine. The percentages of speech-gesture mismatches in the realized animation were 25 % for conduit gestures, 65 % for two-conduit gestures, 60 % for three-conduit gestures.

  • C3. No Gesture: No gesture is performed and only audio instruction is presented with lip-sync animations.

3.3 Dependent Variables

Dependent variables included memorization of technical terms, abstract concepts understanding, learning experience, and perception of the PCA. To evaluate those dependent variables, three types of data are used: performance of a vocabulary recall test, performance of a figure selection test, and answers to a forty-six item questionnaire.

  • Vocabulary Recall Test: Participants were asked to answer the technical terms explained in learning material, such as “Answer three programming languages often used in server-side."In total, they were required to answer fifteen technical terms by six questions.

  • Figure Selection Test: Participants were asked to choose all appropriate figures which describe the relationship between abstract concepts: the interaction between components of web application (user, client, server, and database), the relationship between kinds of application program (general application program, desktop application program, web application program), and the relationship between kinds of programming languages (programming languages for general purpose, server-side development, and client-side development). Nine appropriate figures and eleven inappropriate figures were presented, we counted the number of figures correctly being selected and correctly not being selected.

  • Questionnaire: The forty-six item questionnaire (with a seven-point Likert scale anchored by “Strongly Disagree” and “Strongly Agree”) was grouped into eight categories: learning motivation, self-efficacy, usefulness of the PCA, reliability of the PCA, human-likeness of the PCA, likability of the PCA, self-reflection of learning, and concentration on learning.

3.4 Participants and Procedure

The participants were 120 undergraduate students (male 84 %, female 16 %) in a web application development course held in a large private university located at Kanagawa prefecture in Japan. Students were required to participate in the experiment as a part of course activity, but they were explained that the results will not be treated as a part of their grade.

The three versions of the e-learning systems were deployed on a server, and the URL of the introduction page was announced to the students, explaining they have to prepare the latest version of FireFox and speakers/a headphone, and the URL was going to be expired in a week. The students were randomly assigned to experimental conditions when transfered from the introduction page.

Figure 3 shows the flow of learning and testing. The participants firstly watched 4.5 min long instruction, then took the questionnaire, and the tests. The questionnaire should work as a distraction task for the following tests at the same time.

Fig. 3.
figure 3figure 3

Procedure of experiment

3.5 Design and Hypothesis

The study employed a one-factor three-level between-participants design. The data were analyzed by one-way analysis of variance (ANOVA) with Tukey’s HSD post-hoc comparisons.

We hypothesized that conduit gestures draw learners attentions to technical terms and help the learners remember the words, resulting in more correct vocabulary recall test results, and that two-conduit gestures and three-conduit gestures schematically convey the relationship between concepts by utilizing space, resulting in more correct figure selection test results, when the instruction speech was given with appropriate gestures than when speech was given with mismatched gestures or no gestures. In addition, we also hypothesized that learners judge learning experience and the perception of the PCA higher when the PCA performed metaphoric gestures matched with speech.

4 Results and Discussion

Of 120 students, nine did not access the URL, three did not completed the test, and one reported a web-browser error. And, the data of ten students who took longer than sixty minutes to complete the study were eliminated. Eventually, the data of 97 students were acquired. The numbers of participants assigned to each conditions were 32 for C1, 32 for C2, and 33 for C3.

Fig. 4.
figure 4figure 4

Results of the vocabulary recall test and the figure selection test

Fig. 5.
figure 5figure 5

Examples of appropriate choice and inappropriate choice that the participants in speech-gesture-mismatch group answered wrongly (Q. Select all appropriate figures which describe the interaction between components of web applications).

4.1 Memorization and Understanding

The results of the vocabulary recall test and the figure selection test are shown in Fig. 4. Although there was no statistically significant difference found in the results of recall test, we found a significant main effect of the gesture factor in the figure selection test (F(2,93) = 3.09, p = .007). The students in speech-gesture match condition scored significantly higher (M = 15.7, SD = 2.59) than the students in speech-gesture mismatch condition (M = 13.9, SD = 2.93, t(62) = 2.61, p = .011), and the students in no-gesture condition (M = 13.8, SD = 2.52, t(63) = 2.94, p = .028). The standardized effect sizes for these differences were Cohen’s d=0.63 and 0.76, which indicate between medium and large effects.

Figure 5(a) shows an example of appropriate figures and Fig. 5(b) shows an example of inappropriate figures in a question, “Select all appropriate figures which describe the interaction between components of web applications." The percentages of the choice answered correctly in the speech-gesture mismatch condition decreased more than 20 % against the percentage in speech-gesture matched group, which was over 80 %. Figure 6 shows the annotated script in both conditions. As seen in Fig. 6, in speech-gesture match condition, conduit gestures present the appropriate layouts of abstract components of web application as described in Fig. 5 (a). In contrast, in speech-gesture mismatched group, conduit gestures were meaninglessly put in space.

Fig. 6.
figure 6figure 6

A part of the script annotated differently in speech-gesture match condition and speech-gesture mismatch condition, describing the interaction between three components of web application “client program,” “web application server,” and “database” ([ ]: stroke and hold, translated from Japanese).

These results support our hypothesis stating that spatial use of conduit metaphoric gestures help listeners’ schematic understanding of the relationship between abstract concepts. However, a single conduit gesture which presents an abstract concept did not affect learners memorization of the name of the concept.

4.2 Learning Experience and Perception of the PCA

We did not found many significant differences between conditions from the results of questionnaire, but the results revealed a significant main effect of the gesture factor on perception of reliability of the PCA (F(2,94) = 3.57, p = .032).

Interestingly, despite the incoherence between speech content and gesture representation, the multiple test results showed that the students in speech-gesture mismatch condition answered the PCA was more reliable (M = 4.45, SD = 0.97) than the students in speech-gesture match condition (M = 3.74, SD = 1.21, t(62) = 2.59, p = .027). The standardized effect size for these differences were Cohen’s d = 0.65 which indicates between medium and large effects. The results of each item in the category are shown in Table 1.

Previous psychology studies revealed that non-verbal behaviors, included gestural styles, are linked to personality [1, 5]. The argument was also partially confirmed in the studies of interaction between human and animated character [14]. Neff et al. reported that the perception of extroversion increased when PCA’s gesture rate was high, and when movements were produced fast [22]. In speech-gesture mismatch condition, the number of gesture strokes were larger than speech-gesture match condition, despite the length of audio speech was identical. And, strokes performed rapidly were often seen in the realized animation. This indicates that the difference of the speed of gestures caused the difference of the perception of extroversion of the PCA, and that caused the difference of the perception of reliability.

Table 1. Questions asking reliability of the PCA (Cronbach’s \(\alpha \) = 0.84)

5 Conclusion

We examined the impact of metaphoric gestures performed by Pedagogical Conversational Agent (PCA) on learners’ memorization of technical terms, understanding of relationships between abstract concepts, learning experience, and perception of the PCA. The study employed a one-factor three-level between-participants design where we manipulated gesture factor (speech-gesture match vs. speech-gesture mismatch vs. no-gesture). The data of 97 students were acquired in on-line learning environment. As the results, while there was no effect found on memorization of technical terms, we found that students showed accurate schematic understanding of the relationship between abstract concepts when the PCA used metaphoric gestures matched to speech content than when used gestures mismatched, and no gesture. Contrary to the result, we also found that students judged the PCA useful, helpful, and felt the PCA looked like a teacher when performed mismatched gestures to speech content than when performed matched gesture.