Introduction

The development of Intelligent Tutoring Systems (ITS) has already had a long tradition in technology-enhanced learning (Sleeman and Brown 1982). The evolution of ITS along with research in cognitive psychology and artificial intelligence gave rise to the concept of the “pedagogical agent” (Wenger 2004). In computer science, a pedagogical agent can be regarded as an entity that has specific learning goals and ‘acts’ autonomously in educational settings (Gulz et al. 2011).

Considering the conversational capabilities of recent ITSs, a new subgroup of pedagogical agents has emerged: “conversational agents” (Kerly et al. 2009). Conversational agents, typically, engage learners in a conversation (or simple communication) by using natural language, body movements or even facial expressions (Gulz et al. 2011).

Our work focuses on adaptable conversational agents that use natural language to scaffold peer discussions in collaborative learning settings. The use of conversational agents in the context of Computer-Supported Collaborative Learning (CSCL) constitutes a relatively new area of research aiming to improve students’ learning experience and outcomes (e.g. Chaudhuri et al. 2008; Kumar and Rosé 2011).

In the following, we present: a) the theoretical background of our work, b) the overview of a prototype dialogue-based ITS, namely MentorChat, which uses a conversational agent to promote students’ productive dialogue, and c) a pilot study conducted to contrast the impact of two different agent intervention strategies. The pilot study results provide encouraging evidence regarding the user acceptance of the system.

Theoretical Background

Conversational Agents

A conversational agent can hold a discussion with students in different ways, and the type of conversation may vary from spoken (Wik and Hjalmarsson 2009) to text-based (Chaudhuri et al. 2009) or even non-verbal (Ruttkay and Pelachaud 2004). Likewise, the visual appearance of the agent may be human-like or cartoonish, static or animated, two-dimensional or three-dimensional (Dehn and van Mulken 2000). Conversational agents that have a visual representation are often labeled as “embodied” or “animated” conversational agents (Cassell et al. 2000), especially in situations where it is important for the agent to appear more lifelike. In fact, it was found that the visual appearance and ‘personal’ features are crucial to designing an effective pedagogical agent, regardless of the level of its computational functionality (Baylor 2009; Karacora et al. 2012). Over the past decade, conversational agents have been developed to serve multiple pedagogical roles, such as tutors, coaches or learning companions (Haake and Gulz 2009). In addition, many dialogue-based ITSs have utilized conversational agents to meet a variety of educational needs such as question-answering (Feng et al. 2006), tutoring (Heffernan and Croteau 2004; VanLehn et al. 2007), language learning practice (Hjalmarsson et al. 2007; Wik and Hjalmarsson 2009), and the promotion of health-related behavioral changes (Kennedy et al. 2012) and of metacognitive skills (Kerly et al. 2008).

Many of the conversational agents developed in the past aimed at engaging students in one-to-one (student-agent) tutorial dialogues (Evens and Michael 2006; Graesser et al. 2008; VanLehn et al. 2007). Such agents typically simulated the instructional behavior of a human tutor by exploiting AI techniques to converse with students on pre-determined topics. Based on the observations of VanLehn et al. (2007), a student-agent tutorial dialogue can offer several benefits over a monologue, such as the detection and remediation of failed communication, the correction of inaccurate student knowledge and increased interactivity.

Indeed, a lot of successful stories (e.g. Why2-Atlas, AutoTutor, ITSPOKE, Ms Lindquist, CALMsystem, ReTuDiS) were reported regarding the beneficial use of such agents in individual learning settings (Aleven et al. 2004; Graesser et al. 2005a; Grigoriadou et al. 2005; Heffernan and Croteau 2004; Litman and Silliman 2004; VanLehn et al. 2007). Among other findings, the results indicated learning and memory gains (Graesser et al. 2005b; VanLehn et al. 2007), an increase in student motivation (Heffernan and Croteau 2004) and improvement of self-explanation (Aleven et al. 2004; Grigoriadou et al. 2005) as well as of self-assessment skills (Grigoriadou et al. 2005; Kerly et al. 2008).

Nevertheless, a literature review reveals some points of criticism regarding the impact of conversational agents on individual learning, as follows:

  • The shallow interactions often taking place between the student and the agent (Jia 2004). Rosé et al. (2004) found that students frequently provided more simplified answers to the conversational agent (e.g. one-word responses) as compared to their answers to the human tutors;

  • The agents’ role as teacher substitutes often leading to the exclusion of teachers from the educational process (Shawar and Atwell 2007);

  • The agents’ strict dependence on the instructional domain that may limit (a) the agent’s ability to adapt and operate in a new learning context (Dyke et al. 2012) and (b) the teacher’s ability of configuration (Kinshuk 2002).

Conversational Agents in CSCL

More recently, researchers turned their attention to designing ITS systems utilizing conversational agents for supporting collaborative learning activities (e.g. Chaudhuri et al. 2009; Dyke et al. 2012; Kumar and Rosé 2011; Stahl et al. 2010; Walker et al. 2011).

Research in the field of Computer-Supported Collaborative Learning (CSCL) has consistently revealed that collaboration can foster students’ domain knowledge and also domain-general skills, such as argumentation, critical thinking and problem-solving ability (Dillenbourg 1999; Weinberger and Fischer 2006). The key learning mechanism in collaboration is peer interactions; however, there is no guarantee that these interactions will actually occur (Dillenbourg and Tchounikine 2007). For instance, although the ‘collaboration contract’ implicitly assumes that all partners will contribute to a problem-solving activity, this is not always the case. Research in CSCL has consistently reported on this issue and proposed various methods for increasing the probability that productive peer interactions will actually occur (Dillenbourg 1999). One such method to guide the collaborative activity and foster peer interactions is the use of a didactic scenario or “collaboration script” (Kobbe et al. 2007).

Following what seems to be another promising path, several research groups have started exploring the use of conversational agents to trigger productive forms of peer dialogue and scaffold students’ learning in a CSCL context. The key objective of these efforts is to design the agent’s behavior in a way that leverages constructive conversational peer interactions (e.g. argumentation, explicit explanation, conflict resolution, mutual regulation) (Fischer et al. 2013). The results have been quite encouraging so far, indicating that the use of conversational agents for providing collaborative learning support can significantly enhance the learning outcomes (e.g. Chaudhuri et al. 2008; Dyke et al. 2012; Kumar et al. 2007b).

Many of the studies above explore the impact that conversational agent interventions may have on the quality of peer dialogue and, consequently, on group and individual learning outcomes. For example, the use of tutorial dialogue agents that encourage peers to engage in directed lines of reasoning, also known as knowledge construction dialogues, have been reported to significantly improve learning outcomes (Chaudhuri et al. 2008; Kumar et al. 2007b). In addition, it was revealed that the unsolicited supportive prompts of an agent can be more effective than solicited supportive prompts (Chaudhuri et al. 2009).

Following a similar rationale, Walker et al. (2011) investigated the role of a conversational agent system that displayed reflective prompts during a reciprocal peer tutoring scenario, where two students took turns tutoring each other. The results showed that the adaptive agent support led students to increase the conceptual content of help in their utterances. Wang et al. (2007) also found that students interacting with a conversational agent during a collaborative brainstorming activity produced substantially more ideas than individuals brainstorming with a human peer. However, one should also keep in mind Kumar et al. (2007b) observation that collaborating peers do not always interact productively with the conversational agents, often ignoring their prompts.

Dyke et al. (2012) and Stahl et al. (2010) explored the use of conversational agents that scaffold collaborative learning discussions through an approach called “Academically Productive Talk” (Michaels et al. 2008). The agents employed a number of supportive moves, which have proven helpful when used in classrooms by teachers. The results provided evidence that one form of support, named “Revoicing”, which encourages students to clarify and/or expand their answers, had a significant positive effect on learning and resulted in higher quality group explanations (Dyke et al. 2012).

Another series of studies focused on the benefits of using conversational agents employing social interaction strategies in collaborative learning situations. Once again, results showed that agents engaging in off-task as well as task-related conversations can enhance learning by efficiently managing students’ attention and focus on the task (Ai et al. 2010a; Gulz et al. 2011; Kumar et al. 2010). Kumar et al. (2007a) also reported a strong positive effect on the attitude that students displayed toward the socially capable agents. In addition, it was found that idea generation productivity can be effectively supported by a conversational agent that makes both instructional and socially-oriented interventions (Kumar et al. 2010).

Nevertheless, along with the abovementioned encouraging outcomes, important research questions have arisen concerning conversational agents’ practical design and implementation issues in a CSCL context. For instance, which learner should the agent address a question or a comment to when scaffolding a group? How should the agent identify when it is being addressed and which learner is the addressee to the agent (Kumar and Rosé 2011)? What should be the balance of the on-task and off-task conversations between the student and the agent (Gulz et al. 2011)? What types of collaborative problems are suitable for this type of tutoring (Harrer et al. 2006)? What should the teacher’s role be when using an agent in a collaborative learning setting? And finally, are the effort and expense required to develop such a system worth the outcome (Walker et al. 2011)?

Productive Dialogue

A key pattern emerging from the above studies is that an agent’s intervention in a collaborative setting can be beneficial for the learners if it can support a sustained ‘productive peer dialogue’. However, identifying what a productive peer dialogue is, regardless of the domain, is certainly not a trivial task. In the past, many theorists explored the features that can account for a productive conversation from both a cognitive and socio-cultural perspective (Stahl and Rosé 2011). Despite their differentiation in conceptualization and terminology used, as in “group cognition” (Stahl 2006), “uptake” (Suthers 2006) or “transactivity” (Weinberger and Fischer 2006), their research findings illustrated some consistencies originating from the Vygotskian (Schwartz 1998) and Piagetian (De Lisi and Golbeck 1999) frameworks. The core idea was that knowledge construction during a dialogue occurs through a series of steps, where mental models are articulated, shared, mutually examined and potentially integrated (Sionti et al. 2012; Stahl and Rosé 2011).

The above process lies in the heart of the transactivity theory proposed as a basic theoretical construct for explaining collaborative knowledge construction (Noroozi et al. 2013). Berkowitz and Gibbs (1983) define the term transactivity as “reasoning operating on the reasoning of the other”. According to Ai et al. (2010b), transactive contributions are arguments constructed upon the previously expressed reasoning of self or others. Likewise, Fischer et al. (2013) indicate that a discourse is considered to be transactive when learners use their partners as their resources, building on their earlier contributions.

In the CSCL domain, the use of transactivity has been repeatedly highlighted as a valuable indicator of the learning taking place in group discussions (Sionti et al. 2012). Teasley (1997) found that transactive contributions are positively correlated to learning outcomes in collaborative problem-solving settings. Similarly, Chi (2009) reported that learning activities involving students in using one another as information resources, building on each other’s thoughts, are associated with better learning outcomes as compared to other types of activities. In line with these findings, Stegmann et al. (2011) indicated that a higher level of transactivity can significantly enhance individuals’ domain knowledge. Moreover, Noroozi et al. (2013) revealed that transactivity can increase the quality of both the group and the individual problem solutions produced in a problem-solving activity, while both the externalization of students’ own knowledge and the elicitation of their partner’s knowledge can facilitate collaborative learning.

Another approach consistent with the transactivity perspective is the framework of Academically Productive Talk (APT), which considers social interactions to play a prominent role in inducing beneficial mental processes (Resnick et al. 2010). According to APT, a very effective teaching strategy is to help learners externalize their thinking by “sharing their reasoning out loud” (Michaels et al. 2008). Indeed, a major issue is that some learners tend to avoid making their perspectives explicit to the group so that a common ground can be negotiated (Weinberger et al. 2007). This is especially true in written dialogue where the externalized students’ representations can serve as explicit references that facilitate peer interactions and grounding processes (Papadopoulos et al. 2013; Oehl and Pfister 2009). Dillenbourg and Jermann (2007) also underline that the existence of mutual explicit references between learning partners is essential for the success of their collaboration.

Mentorchat: A Design Proposal

Against the above background, we argue that the development of a conversational agent to support learners in collaborative learning, should take into account the current research evidence on:

  • The value of agent intervention (prompting) in helping learners sustain a ‘productive peer dialogue’;

  • ‘Transactive dialogue’ as a form of productive peer dialogue.

Furthermore, it is our position that the agent should not be designed as teacher replacement, but should serve as an intelligent tool enhancing the impact of the collaboration strategies employed by the teacher, who always remains responsible for the orchestration and scaffolding of the educational activity.

Since it is well documented that peer interaction is a key learning mechanism in collaborative learning, we argue that agent design should not necessarily focus on exhaustively modeling each learner’s understanding by using complex knowledge structures for each different domain. Instead, we suggest that attention should be given on (a) identifying efficient techniques of modeling and triggering constructive peer interactions through appropriate agent interventions (prompts), and (b) promoting the “teacher-as-activity-designer” perspective, by enabling teacher-led configuration of agent behavior (for example, domain modeling and defining agent intervention rules).

In this way, we expect to develop a conversational agent system (a) of considerable pedagogical value, which will engage peers in a highly transactive dialogue by delivering well-targeted interventions, (b) of a significantly lower development cost, as it will not necessitate hardwiring an explicit domain model, thus being reusable, (c) one enjoying broader acceptance by teachers as being teacher-configurable. We consider that this line of work draws away from the typical design of dialogue-based dynamic support systems, which are tailored for specific learning populations and domains. Indeed, we believe that this approach can lead to design specifications of conversational agent systems broadening their perspective of use as configurable domain-independent intelligent tools with a view to scaffolding diverse student populations, collaborating in various instructional domains.

Following this perspective, we have designed and developed a dialogue-based ITS system called MentorChat. In the next section, we present the MentorChat system architecture, explain our research motivation and provide a list of what we consider as important research questions. In this work, we present a pilot study (as an initial step in this research direction) focusing on what we call “the agent intervention mode”, that is, which learner the agent should address when making an intervention into the peer discourse.

Mentorchat: System Architecture Overview

MentorChat has been developed as a cloud-based application that requires a small amount of time to be configured and can be used in different learning contexts. The system is available for both the English and the Greek language and uses web technologies, such as MySQL, PHP, AJAX, HTML5 and CSS3. The overall architecture of MentorChat consists of three core modules: the student, the conversational agent and the teacher module (Fig. 1).

Fig. 1
figure 1

The abstract system architecture

Students may login to MentorChat (student module) to participate in synchronous collaborative activities (text-based chatting). An activity can include many online discussion sessions (phases), each on a different topic. In each phase, students in a group are asked to discuss and submit their joint answer to an open-ended domain question. During peer discourse, the MentorChat conversational agent (conversational agent module) is responsible for tracking and analyzing the group dialogues (peer interaction model), identifying opportunities for intervention. Based on the information provided by the teacher (teacher module), which essentially is a domain knowledge representation (domain model) relevant to the topic of discussion, the agent decides when to initiate a supportive intervention. To accomplish this, the agent implements an intervention strategy and presents prompts to the learners in an appropriate manner (intervention model).

Student Module

Figure 2 displays a screenshot of the MentorChat interface for learners. At the top left corner of the screen, there is a progress bar that indicates both the current phase and the total number of phases of the MentorChat activity (Fig. 2, A). Next to it, a discussion topic is presented to the learners (Fig. 2, B), followed by a chat frame (Fig. 2, C). Also, in the left section of the interface (Fig. 2, D), there is a users’ list that displays the online group members participating in this phase. Next to each username, there is an icon representing the role of the user in the current phase of the activity.

Fig. 2
figure 2

A screenshot of MentorChat student interface

When learners login to MentorChat using their web browser, they are forwarded to a private discussion channel according to their assigned group. Subsequently, a conversational agent provides the students with guidance pertaining to their role and task in the current phase of the activity. The interface employs an animated human-like two-dimensional avatar to serve as the agent representation (Fig. 2, E). In addition, a Text-To-Speech engine is utilized to direct the attention of users and enrich the agent modality.

The conversational agent also attempts to efficiently manage students’ attention by employing an “Attention Grabbing” strategy, which we have designed based on the work of Kumar and Rosé (2011). According to this technique, when the agent detects an opportunity for support, it does not interrupt students’ discussion by immediately making an intervention. It first displays a question mark icon above its avatar and, only after some time has passed will the agent’s message appear. In that way, the system prepares learners for the intervention to follow, turning their attention to the agent.

Furthermore, in contrast to the design of the previous MentorChat versions, in which all agent’s messages were presented in the main chat window (Fig. 2, C), the current system version (v.2.5) displays the agent interventions in a bubble that dynamically appears next to the agent avatar (Fig. 2, F). The rationale behind this design decision was derived from the findings of our previous study and responded to shortcomings we identified (Tegos and Demetriadis 2012). More specifically, we observed that students sometimes (a) ignored the agent interventions until they had completed their on-going interaction, and (b) if they did decide to respond, they sometimes had trouble scrolling up the chat list to locate the last agent intervention after some time had passed. As a result, we decided to implement the above agent intervention interface to enable students to have constant access to and respond to the agent’s question whenever they felt ready. This mechanism was also found to be helpful in identifying which student of the group has provided a direct answer to the agent. After an answer has been submitted, the bubble fades out and both the agent question and its corresponding answer appear in the chat frame.

Teacher Module

A teacher can configure MentorChat to design and deploy a collaborative activity through dividing learners in groups, assigning roles to learners, setting up the sequence of the collaborative tasks, and modeling the activity domain, which will be used by the agent to scaffold learners’ discussions. In addition, the teacher has the ability to monitor learners’ interactions while the activity takes place. All these functions are accessible via the following administration panels: a) the user management, b) the activity structure, c) the domain modeling, and d) the monitoring panels (Fig. 1).

User Management Panel

The user management panel is responsible for the administration of all participants of the collaborative activity. Using its interface, a teacher can add, delete or edit information such as the username, the password or the group of students. In addition, the teacher can select the roles that will be assigned to each user at the beginning of the activity.

Activity Structure Panel

The activity structure panel provides the ability to manage the phases of the collaborative activity. In particular, the teacher can create a phase by entering an open-ended domain question or a debate to be resolved (Fig. 2, B). As mentioned before, a phase represents the topic of discussion that encourages students to collaborate in order to reach a common agreement and submit the final answer of their group. The teachers are not restricted to use only text in the discussion topics but they can also insert hyperlinks, images or even videos to present specific concepts to their students in a more comprehensible and engaging manner.

Domain Modeling Panel

The domain modeling panel is essentially an authoring tool that enables the teacher to model the domain and, through that, the agent interventions. It should be noted that learners can only see the final agent intervention appearing on the screen and not the authoring panel. Modeling the domain is possible in two different ways, using either (a) the automatic or (b) the manual modeling method.

The automatic modeling method is based on a concept mapping interface (Fig. 3, A). Through this interface, the teacher can create a concept map by entering a set of ‘facts’. In a basic scenario, a fact is a simple sentence which consists of a subject, an object and a verb. The system interprets these elements as a relationship (as stated by the verb) between two concepts (as stated by the subject and the object), and stores them in a computational format. More specifically, when the teacher enters a fact, the system renders it dynamically, visualizing the nodes and their relationship as a concept graph. This graph constitutes a domain knowledge representation, which is utilized by the agent for implementing its interventions in various ways. For instance, if the system detects that students are talking about two relevant domain concepts, the agent may ask students to comment on whether or how these concepts relate to each other. In this manner, based on the teacher’s graph, which might actually include some incorrect relationships, the agent may ask learners “do you agree or disagree with the following statement: ‘success leads to happiness’? Comment on that”. The automatic modeling method is currently a work in progress that will be discussed and assessed in a future study.

Fig. 3
figure 3

The interface of the automatic modeling method

The manual modeling method enables teachers to gain low-level control of the agent’s behavior via a set of rules that determine the agent interventions. Typically, a teacher-defined rule, which is applicable for a selected phase, can consist of three parts: an ‘event’ (mandatory), a ‘condition’ (optional) and an ‘intervention’ (optional). The event is a linguistic pattern (e.g. a key word) that serves as an opportunity for the display of an agent intervention if the defined condition is met (e.g. after students have discussed about a specific concept). The intervention, which is the only part of the rule displayed to the students, may include not only text but also system operators or other forms of multimedia content such as images, videos or hyperlinks. The operators are system-reserved words that either activate special agent functions, such as facial expressions (e.g. “_sad_”, “_surprised_”), or serve as variables (e.g. “_timestamp_”), which are replaced with the corresponding text when the intervention is delivered.

Moreover, in the manual modeling method, a teacher may decide to omit entering the intervention part of a rule, allowing the system to automatically generate the intervention from scratch. In this case, the agent draws upon the available system dictionaries and the teacher-defined triggering event - for example the presence of a key concept in the students’ dialogue – to dynamically synthesize the agent intervention. The intervention is formed based on either one of the following intervention types: a) encouraging students to explicate their reasoning regarding the key concept (e.g. “Why do you think that X is important?”) and b) asking students to apply their own reasoning to their partner’s reasoning (e.g. “Do you agree or disagree with your partner about X? Why?”). The selection of these two supportive actions was based on the framework of APT, which refers to them as ‘productive talk moves’ (Resnick et al. 2010).

The authorability of domain models in rule-based dialogue management systems is generally considered to be associated with the sophistication and efficiency of the conversational agents (Sagae et al. 2011). In an attempt to reduce teachers’ workload and support authors of different background (e.g. novices or experienced), we have decided to implement both a higher-level authoring method, utilizing concept mapping as a powerful graphical tool for organizing and representing knowledge (Novak and Cañas 2008), and a rule-based method that enables teachers to manage lower-level details regarding the agent behavior (Sagae et al. 2011). Our system does not enable the identification of multiple dialogue acts in a single student’s utterance as other dialogue management systems do, such as the classifier-based approach (Morbini and Sagae 2011); still, we consider our approach to offer significant advantages towards the reusability of teacher-defined domain models. This is accomplished by providing access to a library of pre-built concept maps and behavioral rules that can facilitate the domain modeling process.

Monitoring Panel

The monitoring panel allows the teacher to observe the group discussions in real time. There are several filters, such as the student’s name, the date-time or the discussion topic, which enable the teacher to isolate and focus on specific fractions of students’ discourse. In addition, the monitoring panel employs a set of visualized interaction analysis indicators. These are categorized according to their focus on either individual learners (e.g. number of posts, post frequency) or groups (e.g. on-task rate, number of agent interventions). Some interaction analysis indicators have a comparative role and provide information about the interactions taking place during the activity (e.g. which the most active participants are, which group discussions have covered all the teacher-defined key topics).

In addition to the above administration panels, there is also a general settings panel, which enables the teacher to customize specific options of the educational activity. Particularly, the teacher can alter features, such as the name of the conversational agent, the role instructions, the activity timeframe, the system dictionaries and many others.

Conversational Agent Module

The architecture of the conversational agent module is based on three models: a) the peer interaction, b) the domain and c) the intervention models (Fig. 1).

Peer Interaction Model

The peer interaction model analyses and stores the most important aspects of students’ interactions. More specifically, this model is responsible for identifying and recording all the interactions that could serve as opportunities for a potential intervention. Therefore, by tracking the topics of discussions, the peer interaction model forms the knowledge representations of both the group and the individual learners. These representations are stored in an appropriate format so that they can be easily processed by another computational model.

Domain Model

The conversational agent module exploits the domain model arising from the concept map or the set of rules prescribed by the teacher in the domain modeling panel. In each phase, the teacher-defined domain model (i.e. knowledge representation) is compared with the corresponding individual’s or group’s knowledge representations to determine whether an agent intervention is appropriate.

To perform the comparison of these knowledge models, MentorChat utilizes a suffix-stripping stemming algorithm in conjunction with a set of appropriately structured dictionaries. The stemmer used is based on two slightly revised PHP versions of the Porter stemming algorithm (Porter 1997) and the Saroukos stemming algorithm (Saroukos 2008) to reduce the inflected or derived words to their stems or root forms. On the basis of this information, MentorChat is then able to use the available dictionaries in order to recall the corresponding synonyms and antonyms. These are essential for (a) the accuracy of the pattern matching algorithms used in MentorChat and (b) the natural language generation required for the synthesis of the agent interventions. The latter system feature is also enhanced by a large pool of stored phrases that can be used to increase the diversity of the agent interventions made.

The system dictionaries have a flexible structure and can be managed through the interface of a dictionary management panel. The creation of a dictionary is a relatively simple process, which can be done by uploading a spreadsheet file (.csv or .xls) with specific headers (e.g. ID, Word, Synonyms and Antonyms). Some dictionaries can be designed for general purpose, and, thus, be used in various activities, while others can include domain-specific terminology, complementary to that of the general ones, which may not suffice for a particular task. Hence, whenever the teacher models the domain of the activity (via the domain modeling panel), he/she can select which of the listed dictionaries will be used by the system to enhance agent performance (Fig. 3, B). For instance, a teacher organizing an activity about a specific instructional domain can readily create a new dictionary that includes a list of synonym terms relevant to the task. Thus, if the teacher selects the domain-specific dictionary during the setup of the activity domain model, he/she enables MentorChat pattern matching algorithms to detect all the terms that carry the same meaning for the specific activity phase.

Intervention Model

After the agent domain model has indicated that a specific intervention is appropriate, the intervention model is responsible for making the final decision about whether or not an action needs to be taken. This is possible by performing a number of simple checks which ultimately determine if a supportive intervention should be made.

A part of this sequential checking procedure is the estimation of the time passed since the last agent intervention. This timeframe should be over the specified threshold to avoid excessive interference from consecutive agent interventions. Additionally, before intervening in students’ discourse, the agent needs to ensure that its previous intervention has already been responded to. Eventually, if all conditions are met, the intervention model must determine the target of the specific intervention. For instance, the agent intervention may address the entire group or just a specific student. In fact, this design choice constitutes an interesting research question to be examined in a following section.

Research Motivation and Questions

Currently, we have set up some broad research directions relating to:

  • Pedagogical effectiveness of the agent; explore such issues as the level of agent intelligence and modeling required (including, for example, optimization of the peer interaction and the intervention model) so that a transactive dialogue is sustained, resulting in improved learning outcomes.

  • Acceptance by students and teachers; research usability issues, facilitate the teacher-led domain modeling process and investigate what teachers would consider as beneficial use scenarios for such a system in their courses.

In this work, we specifically present a pilot study in the CALL domain, exploring some issues relating to the pedagogical effectiveness of the agent as well as the students’ acceptance and usability of the system. As we have emphasized elsewhere (Magnisalis et al. 2011), an important aspect in the design of collaboration support systems is how the system supportive intervention is presented to the learners. Walker et al. (2009) have proposed that the system intervention may vary, employing either a direct or indirect approach in presenting the prompts. “Direct” means that the system directly addresses the learner who is considered to need support (i.e. the ‘weak’ partner), while “indirect” refers to the system addressing the other partner, who is supposed to follow the strategy suggested in order to help the ‘weak’ partner.

Based on the above considerations, the main objective of the study is to explore whether the ‘weak’-directed intervention mode (WDI: the agent addresses one particular student – the ‘weak’ partner) can be more beneficial than the undirected intervention mode (UI: the agent addresses the whole group and not any particular student). We think of this research question as a special case of the broader research issue explained above, that is, the effectiveness of differently presented support intervention techniques in the context of a collaborative activity. We expect the outcome of the study to inform researchers and designers on the pedagogical value of implementing any of these specific agent intervention modes.

Pilot Study

Method

Instructional Domain

The study was conducted in the domain of language learning. Generally, Computer-Assisted Language Learning (CALL) is considered as an ‘umbrella term’ that embraces the use of a wide variety of ICT applications for the acquisition of a foreign language (Levy and Stockwell 2006). The main idea behind most of the language learning applications is the combination of pedagogical theories, such as cognitivism and constructivism, with second language acquisition theories, such as Long’s interaction hypothesis (Long 1996).

Second Language Acquisition (SLA) refers to the process of learning a language other than the mother tongue (DeKeyser 2007). According to Long (1996), SLA is strongly facilitated by using the target language in learners’ interactions. Indeed, the importance of conversational interactions in second language learning has been repeatedly highlighted by researchers (e.g. Mackey 2007).

Former studies have indicated that the use of conversational agents in language learning settings can be quite helpful (Engwall 2008; Fryer and Carpenter 2006; Hjalmarsson et al. 2007; Massaro et al. 2006; Shawar and Atwell 2007; Wik 2011; Wik and Hjalmarsson 2009). Learners who interact with animated conversational agents tend to become more motivated and enjoy their interaction with the system (Fryer and Carpenter 2006; Shawar and Atwell 2007). Indeed, the versatility of conversational agents, which are able to enact different personalities and roles, can increase the interest of students and, thus, the time they spend on the learning task (Wik and Hjalmarsson 2009). Another benefit of embodied conversational agents is their potential to leverage features, such as facial expressions and speech synthesis, allowing them to communicate with language learners using multimodal messages (Hjalmarsson et al. 2007; Massaro et al. 2006; Engwall 2008). According to Burnham and Lau (1999), this can be extremely useful since students listening to a foreign language incorporate visual information to a greater extent than people listening to their native language. Furthermore, Wik (2011) investigated the use of a conversational agent as a teaching companion for language learning. His study revealed that most of the students perceived their interaction with the embodied agent as a useful learning experience that enriched their vocabulary.

Drawing upon the findings of previous studies, most of them emphasizing on individual language learning, we decided to explore the use of our conversational agent in a collaborative language learning setting. Specifically, the study was carried out in the context of a third-year university course named “Practical Course of the English Language”. This course was offered by the Department of English Philology at the Taurida National University of Ukraine.

Participants

The participants were 30 (26 females and 4 males) undergraduate students enrolled in four different classes of the same teacher. Although students included native speakers of Russian, Tartar, Armenian, Ukrainian, and Hebrew, most of them spoke to each other in Russian. Their age ranged from 18 to 21 (M = 19.23, SD = 0.15).

Materials

The materials used in the exploratory study consisted of a pilot installation of the MentorChat tool, a post-task questionnaire and an open-ended interview protocol. Given that most participants were females, we decided to use a male agent (Steve) in MentorChat (Fig. 2, E) considering the research findings that using agents of the opposite gender can increase participants’ performance and motivation (e.g. Karacora et al. 2012).

Procedure

Prior to the date of the study, the teacher proposed three open-ended domain issues, each introducing one of the three phases of the activity. All issues were related to the concept of ‘success in life’ (e.g. “what do you believe are the ‘keys to success’?”), which had been previously discussed in classroom. Additionally, the teacher configured the agent behavior via a set of rules. Each rule consisted of a key word or phrase and a relevant agent intervention. The key words and phrases selected by the teacher also included some misspelled or easily confused words based on the teacher’s experience and awareness of students’ most common errors. The interventions were reflective questions that required students to elaborate their discussion and provide a thoughtful, reasoned response (e.g. “Do you think that social background can influence career success? In what way?”).

First, students attended a 10-min presentation of MentorChat key functionalities, which aimed to familiarize them with the interface options and the types of agent interventions. Subsequently, students were randomly assigned to small groups (12 dyads and 2 triads) and asked to use MentorChat to collaborate and provide their joint answers to the three open-ended domain issues. The activity took place in two university computer labs.

At the beginning of every activity phase, the agent posted some messages that increased students’ awareness (e.g. “Janna has just entered phase 3”) and provided guidance on the roles students had to enact. Participants were assigned to either the role of author or the reviewer. The author was responsible for initiating the discussion and submitting the final answer of the group to the open-ended question at the end of each phase. The reviewer had to propose improvements on author’s contributions by commenting, suggesting corrections, introducing additional information and so on. When students submitted their joint answer, they could proceed to the next phase where their roles were reversed. In the case of the triads, there were always two reviewers and one author.

Students were asked to complete the three tasks of the collaborative activity within the 1-h limit. After finishing the activity, students were requested to complete a post-task questionnaire. Finally, 3 days later, students were invited to participate in a focus group session.

Compared Conditions

During the students’ discourse, the agent displayed a teacher-defined intervention whenever an associated linguistic pattern was identified. In our study, we examined and compared the effects of two intervention modes: a) the undirected interventions (UI) that targeted the entire group of students (for example, “Can you give an example of a successful person?”) vs. b) the ‘weak’-directed interventions (WDI) that targeted only the ‘weak’ student (for example, “Janna, can you give an example of a successful person?”). The agent intervention model was customized to function differently in each activity phase. More specifically, in the 1st phase, half of the teacher-defined interventions were tailored by the system to be WDI while the other half followed the UI formula. The 2nd phase of the activity included only UI while the 3rd phase incorporated only WDI.

As regards the UI (Table 1, row 2), the agent displayed its question to all group members, allowing any of them to respond. Peers were expected to provide a response in a coordinated way (one of them) by typing in the agent answer box (Fig. 4, A), which remained available until an answer had been submitted.

Table 1 An example of a dialogue containing a UI
Fig. 4
figure 4

A: UI interface, B1-B2: WDI interfaces

In the WDI condition (Table 2, row 2), although the agent displayed the question simultaneously to all peers, the answer box was disabled for all students (Fig. 4, B2) except for the one specified in the agent question (Fig. 4, B1). According to our WDI mode, the agent addressed the partner of the student who had activated the agent intervention by mentioning key domain concepts. In the following, we refer to the student who introduced these concepts into the discussion as ‘S-student’ (‘S’ for ‘strong’), whereas the other student(s) are referred to as ‘W-students’ (‘W’ for ‘weak’). In Table 2, when Chris (S-student) mentions task-related concepts triggering agent intervention (such as “bad mood” and “think positively”) the agent directs its question to Janna (W-student) to respond. The agent intervention assumption is that, in the current dialogue turn, the W-student (Janna) might have weaker understanding than the S-student (Chris) about the concepts brought up by the S-student. In the case of a three-member group, the agent selects the target of its WDI randomly between the two W-students of the group.

Table 2 An example of a dialogue containing a WDI

Measures

Post-task Questionnaire

After finishing the collaborative activity, students completed a student opinion questionnaire, which collected profile data and elicited students’ opinions regarding their collaboration and interaction with the system. The questionnaire included 25 questions: 2 open-ended, 2 multiple-choice, and 21 Likert-scale items. The Likert-scale questions asked students to express their agreement or disagreement on a 5-point scale ranging from 1 (disagree) to 5 (agree).

In order to extract valuable information about the system overall usability and efficiency from the Likert-scale questionnaire items, we came up with the following three indexed variables: ‘perceived usefulness of peer discussions’ (Cronbach’s alpha = 0.74), ‘perceived usefulness of interactions with the agent’ (Cronbach’s alpha = 0.72) and ‘perceived ease-of-use of system’ (Cronbach’s alpha = 0.71). These metrics were selected considering the principles of Technology Acceptance Model, which is widely used in information systems research to model how users accept and use a technology (Venkatesh and Davis 2000).

Apart from the statistical analysis concerning the above indexed variables, measures of central tendency were also calculated for each variable of the questionnaire, which reflected a high overall reliability (Cronbach’s alpha = 0.88). Furthermore, a series of Pearson product–moment correlation coefficient measures were computed to assess the relationships between the questionnaire variables.

Discourse Analysis

A qualitative analysis of group discussions was performed to draw inferences regarding the effect of the agent interventions. In particular, an adjusted version of the IBIS discussion model was used to identify the types of students’ contributions. The IBIS model was initially designed by researchers (Kunz and Rittel 1970) as a method for structuring discussion activities of collaborative design. It was later proven to be an effective model for analyzing students’ on-line work in collaborative settings (Eggersmann et al. 2003; Liu and Tsai 2008). In the current study, we were specifically interested in measuring the amount of students’ explicit reasoning triggered by agent interventions in the two different conditions (WDI and UI). For this purpose, we adjusted the IBIS scheme to include, among others, the following discussants’ contribution types: ‘Explicit Positions’ and ‘Explicit Arguments’. The overall scheme used in our analysis is as follows:

  • Off-task: representing contributions irrelevant to the task (e.g. jokes)

  • Repetition: representing reiterations of previous statements

  • Issue: representing what needs to be done or resolved as the learners’ task. A contribution of this type could be a question to be answered in order to advance the conversation (e.g. “So, do you think that having someone to support you is important for your success?”)

  • Position: representing opinions related to the resolution of a raised issue (e.g. “I think having someone to support you is important.”)

  • Explicit Position: representing positions that explicitly display reasoning about domain concepts (e.g. “Social background can definitely influence career success because some poor families have no opportunities for education.”)

  • Argument: representing opinions that may support or object to a position (e.g. “I agree with your answer”, “I think you are right.”).

  • Explicit Argument: representing arguments that (similarly to explicit positions) display explicit reasoning about domain concepts (e.g. “I agree! We should mark our mini goals every day because this will help us achieve our final goal”).

  • Management: representing management-oriented contributions that are useful for the coordination of the activity (e.g. “There is no time left”, “let’s proceed to the next phase.”).

  • Common Understanding: representing short utterances (usually one- or two-word phrases) used by students to establish a common understanding on the subject of discussion (e.g. “OK?”, “Sure…”, “I see”).

Based on this scheme, student contributions were independently coded by the paper authors. Any disagreement was resolved by discussion.

Focus Group

Students were invited to participate in a focus group session. The focus group protocol was semi-structured and allowed for open-ended discussions. A series of questions were asked to elicit students’ opinions about: a) the activity as a whole (likes/dislikes), b) the collaboration with their partners and c) the role of the agent. The qualitative data derived from the discussions were transcribed verbatim and analyzed in search of common themes. An analysis was performed using the open-coding process of the constant comparative method (Corbin and Strauss 1990).

Results

Post-task Questionnaire

Most students (22 out of 30) were familiar with instant messaging applications. Additionally, students rated their typing skills to be above average (n = 30, M = 3.55, SD = 0.19) using a 5-point scale (1-slow to 5-fast).

Table 3 displays the questionnaire results about the indexed variables related to system usability and user acceptance. A statistical analysis of all questionnaire variables indicated positive student opinions regarding their understanding of the interface functions (M = 4.77, SD = 0.21) and of their roles in the activity (M = 4.77, SD = 0.11), the system response time (M = 4.50, SD = 0.90), the enjoyment derived from their discussions (M = 4.47, SD = 0.77), and the perceived quality of the learning experience (M = 4.30, SD = 0.99). Also, the majority of the students agreed with statements such as “the agent questions helped me to recall/find out valuable information about the topics discussed” (M = 4.60, SD = 0.72) and “I understood the subject better through answering the agent questions” (M = 4.47, SD = 0.57)”. Slightly lower scores were reported for the performance (M = 3.60, SD = 1.13) and helpfulness (M = 3.86, SD = 0.94) of the agent’s computer-generated voice.

Table 3 Indexed variable values

A series of Spearman’s rank correlation coefficient measures were also computed to evaluate the relationships between the questionnaire variables. Table 4 reports some of the most significant correlations found.

Table 4 Significant correlations (* indicate significance at the 0.01 level)

In an optional question asking students to select their preferred intervention mode, 15 students chose the UI and 9 the WDI mode.

Discourse Analysis

The discourse analysis revealed a total number of 731 students’ contributions, some of which consisted of multiple students’ posts. Table 5 depicts the overall results of the coding process conducted for all student groups. Among the other contribution types, it was shown that students produced a total of 190 explicit positions and arguments (Table 5, rows 9 and 10). Such contributions, which display explicit reasoning, play an important role in calculating ERR, a metric described below that measures the efficiency of the intervention model.

Table 5 The results of the coding process

Table 6 illustrates the average time groups have spent in each phase of the activity. As expected, the discourse analysis revealed that students spent some extra time in the first phase, providing some off-task contributions that either played a social function or were not directly related to the task. It should also be noted that all groups managed to successfully complete all three phases of the activity.

Table 6 The average duration of each activity phase in minutes (n = 14 groups)

Among students’ discussions, 75 agent interventions were identified, 47 of which were WDI and 28 UI (Table 8, row 2). More specifically, 19 WDI and 8 UI were made during phase 1 (n = 14, M = 1.93, SD = 1.26), 20 UI during phase 2 (M = 1.43, SD = 0.85), and 28 WDI during phase 3 (M = 2.00, SD = 1.30). As for the impact of the two intervention modes, it was shown that 76 out of the 190 explicit positions and arguments found in the students’ dialogues (Table 5, rows 9 and 10) were induced by the agent (Table 8, rows 3 and 4). A contribution was identified as agent-induced if it was strongly related to the agent intervention (regardless of the agent intervention mode and the partner who provided the contribution). In this manner, it could be either a direct answer provided via the agent answer box or a following comment relevant to the agent intervention. The coding process of the agent-induced positions and arguments was found to be a relatively easy task since most of the students’ messages included explicit references to the agent intervention, such the ones depicted in Table 7 below.

Table 7 An example of an agent intervention (WDI) inducing multiple explicit students’ responses

To analyze the impact of the different agent intervention modes on the ‘explicitness’ of students’ contributions, we proceeded to calculate the ‘explicit response ratio’ (ERR) for both WDI and UI conditions. Measurements were taken from all the three phases of the activity. For each student group (n = 14), we divided the number of the agent-induced explicit contributions (positions and arguments) by the number of the agent interventions triggered in the group (Table 8). We consider ERR as a statistical measure of the explicit responses generated in the dialogue due to agent intervention. To control for statistical significance, we applied parametric statistics since the normality and the homogeneity of variance criteria were satisfied. A paired sample T-test identified significant difference in favor of the WDI condition (n = 14, M = 1.21, SD = 0.44) as compared to the UI condition (n = 14, M = 0.77, SD = 0.58); t(13) = −2.70, p = 0.02, d = 0.85.

Table 8 The impact of the two intervention modes on ‘explicitness’

Due to the number difference between the two intervention modes (see Table 8, row 2), we also applied a correlation analysis in order to evaluate whether the ERR of the interventions were related to the number of interventions made in each group. The calculated Pearson product–moment correlation coefficients yielded no significant correlations between the quantity and the ERR of the agent interventions.

Focus Group

Three common themes were identified in the qualitative analysis of the focus group discussions (Table 9). Specifically, students stated that: a) the agent enriched their vocabulary by helping them recall important concepts, previously discussed in classroom (e.g. “It has been very helpful in remembering the words and phrases we had already discussed during the lectures”), b) although the computer-generated voice of the agent was helpful, it was not perceived as natural (e.g. “It is useful and understandable, but can’t be compared to the naturalness of human voice”), and c) they would prefer to discuss with an English native speaker rather than their classmate (e.g. “It would be better if I could discuss with a foreign student. A native speaker would be awesome!”).

Table 9 Qualitative themes

Discussion and Conclusions

In this paper, we have presented the design of MentorChat, a prototype system utilizing a teacher-configurable domain-independent conversational agent to trigger students’ productive dialogue. Additionally, we conducted a pilot evaluation study in the domain of second language acquisition, with the results indicating that the agent WDI mode is more effective in increasing explicit contributions in peer dialogue, as compared to the UI mode. We consider this to be a useful indication regarding the design of conversational agents in the CSCL context.

Discussion

Post-task Questionnaire

Regarding user acceptance and usability of the system some positive students’ opinions were identified. In particular, students perceived the system as an easy-to-use tool and considered their interactions both with their partner(s) and the agent to be helpful for language learning (Table 4). Students enjoyed the discussion with their partner(s) and considered the collaborative activity to be a beneficial learning experience. Moreover, they stated that the agent increased their interest in the discussion and helped them recall important domain concepts, previously discussed in classroom. We believe that the agent questions helped students externalize parts of their domain understanding and practise their language skills by making their thoughts, reasons and ideas explicit. Thus, our study provides additional research evidence regarding the already recorded potential of reflective questions to maintain students’ interest, encourage students to think on the content of the lesson, and elicit particular structures of vocabulary items in language learning settings (Richards and Lockhart 1994).

Some significant correlations were also found with regard to the students’ answers in the post-task questionnaire (Table 4). Firstly, it was recorded that students who did not frequently use chat applications enjoyed their discussion in MentorChat (Table 4, row 2) more. This is not surprising since students who are unfamiliar with instant messaging applications are expected to be more excited by the use of such a communication tool.

Furthermore, students who better understood their roles during the activity (author/reviewer) felt that the agent interventions helped them (Table 4, row 3) and improved their collaboration to a greater extent (Table 4, row 4). We interpret this as additional research evidence supporting the view that the effectiveness of role playing in CSCL is reasonably linked to the success of the entire collaboration process (Kobbe et al. 2007).

In addition, participants who claimed to have been helped the most by the agent’s voice also considered that (a) the agent interventions were more helpful (Table 4, row 5) and (b) the activity enhanced their domain knowledge (Table 4, row 6). It seems that, despite the major challenges that Text-To-Speech (TTS) software face (Taylor 2009), the utilization of speech synthesis techniques constitutes a crucial design factor, even for text-based dialogue systems.

Another correlation indicated that students who were more interested in their peer discussion, perceived the collaborative activity as a more beneficial learning experience (Table 4, row 5), which improved their domain knowledge (Table 4, row 6). Finally, it seems that the interface intuitiveness is positively correlated to the extent to which learners considered the activity as beneficial for learning (Table 4, row 7).

Discourse Analysis

The aim of this pilot study was to compare the effectiveness of two types of agent interventions: a) undirected intervention (UI), that is, the agent addressing all group members, and b) ‘weak’-directed intervention (WDI), i.e., the agent questions targeting only the ‘weak’ or W-student in each case. As explained, in WDI the S-student is always the student who triggers the agent intervention by mentioning key domain concepts (according to rules set by the teacher), and the agent directs its intervention (prompting question) to the other student (W-student).

In order to assess the impact of agent intervention mode on students’ dialogue, we proceeded to a discourse analysis, modeling each agent intervention as a generator of N explicit responses and introducing a new metric called ERR (explicit response ratio). The ERR is basically a statistical measure of how many explicit responses (positions and arguments) are triggered by an agent intervention. In this manner, ERR can range from zero - if no explicit responses are stimulated – to more than one if multiple explicit responses are stimulated by the agent intervention. The analysis showed that the WDI condition triggered considerably more responses displaying explicit reasoning, as compared to the UI condition (Table 8). Attention is called to the fact that this study only measures the ERR hypothesizing that high ERRs are connected to highly transactive forms of peer dialogue and to a high level of learning outcomes. However, the latter are assumptions that need to be further investigated.

A possible explanation for this higher ERR effect of the WDI mode may be the fact that WDI applies some beneficial situational constraints affecting the configuration of peers’ internal scripts and, consequently, peers’ behavior. The term “internal script” refers to the knowledge an individual has about a recurring collaborative situation and, more specifically, to the internalization of collaborative practices as collaboration skills and general cognitive strategies. According to Fischer et al. (2013), in a collaborative activity learners’ understanding and actions are guided by internal scripts, which can be dynamically configured multiple times during the activity. Kollar et al. (2007) indicate that the development of well-configured internal scripts can improve subject matter knowledge acquisition and increase the quality of the arguments produced in collaborative settings.

In our study, we argue that the WDI mode has facilitated the configuration process of students’ internal scripts by inducing situational constraints that reduced the probability of dysfunctional configuration of internal script components. As we are in a position to claim (after carefully extending our peers’ discourse analysis to identify patterns of students’ behavior after an agent intervention occurred), there was little to no coordination between partners in the case of the UI mode. Most of the times, students did not talk to each other about who should submit the answer to the agent. In fact, some students rushed to respond first. On the other hand, in the WDI condition, we identified a beneficial interaction pattern. In 7 (out of 14) groups, we observed that when an agent intervention targeted the W-student, after the W-student had responded, the S-student also provided an explicitly reasoned contribution as a follow-up response. Although further research with a larger sample is required to extend our understanding about this interaction pattern, we found that the S-student often made explicit his/her opinion too, elaborating on concepts addressed by the agent question. Bearing in mind that this kind of interaction processes can increase the level of transactivity (Ai et al. 2010b), we argue that the WDI mode can promote students’ productive talk more effectively than the UI mode. We expect that the WDI mode can encourage more transcactive conversational behaviors as compared to the UI mode, by asking the W-student to provide a reasoned response to questions that originated from previous contributions of the S-student. Similar views on how to augment the transactive quality of peer dialogue, by directing one student to reason based on the other’s contributions, have been proposed by other researchers as well (Sionti et al. 2012). Worth mentioning is the fact that most students selected the UI as their preferred intervention mode in the post-task questionnaire, which can be probably explained by the higher level of freedom experienced by students in the UI condition.

In general, we believe that the effectiveness of agent interventions can be improved by choosing a more appropriate level of coercion and inducing students to follow a specific protocol in their interaction with the agent. Therefore, enabling the system to regulate such student-agent interaction processes by itself may be more efficient than just encouraging and expecting the students to play that role themselves.

Focus Group

During the focus group session, almost all students stated that the agent interventions enhanced their vocabulary skills through the question-answering process (Table 9, row 1). When they were asked why, some students mentioned that the agent questions introduced a new discussion perspective that made them expand their thinking and use new words.

Another theme identified was associated to the agent computer-generated voice (Table 9, row 2). Although the students considered the agent voice as helpful for the task, they commented that it was ‘unnatural’ and ‘robot-like’, mentioning that “it could certainly be improved”. This result is in line with Veletsianos (2012) findings, who also reported a similar learners’ dissatisfaction with the computer-generated voice of the agent. Despite the above, it should be noted that, in our study, the agent voice was not perceived as a dialogue disruptive feature by the students. On the contrary, some students stated that the agent speech helped them turn their attention to the agent intervention.

Finally, it was recorded that most Ukrainian students of English Philology would prefer to discuss with a foreign student (English native speaker) instead of their classmate (Table 9, row 3). This suggests that in the CALL domain a more engaging (and possibly of higher learning impact) conversational agent, should be ‘disguised’ as and possess the conversational skills of a target-language native speaker. However, this type of agent design analysis is beyond the scope of this work.

Study limitations

This study has certain limitations such as the short length of the activity and the small number of participants. Given that this is a pilot study, we consider it to provide valuable insights to be verified by conducting a more robustly designed research study in the future. Another limitation might be due to the ordering effects emerging from the specific phase sequencing. More specifically, placing the WDI intervention mode in the third (and final) phase of the activity may have caused a bias in favor of the WDI mode (that is, students being familiar with the agent intervention might have been more active in producing explicit responses during the third WDI-based phase). Although such an ordering effect cannot be excluded, we do not believe that ordering effects have a major impact on the outcomes, considering that: first, measuring the impact of the two modalities was also based on data recorded during the first activity phase, indeed 19 (out of 47) WDI (40 % from all WDI) already appeared in the first activity phase, when there was hardly any particular ordering effect; second, a demonstration of the two intervention modalities took place before starting the activity, therefore the students were, to a certain extent, familiar with both modalities right from the beginning.

Conclusion

Despite the limitations discussed above, we consider the results of this pilot study as encouraging and promising. Our findings indicate that the utilization of such a teacher-configurable conversational agent can be both feasible and beneficial for language learning. In addition, the study provides some preliminary evidence in favor of, what we call, a ‘weak’-directed agent intervention mode (with the agent addressing only a specific group member). This modality seems to engage students in a more productive dialogue (as measured by the ERR variable) than an undirected intervention mode. Undoubtedly, more research is needed to better understand how students’ productive talk can be facilitated by efficiently modeling the student-agent interactions in collaborative situations. We consider that MentorChat creates some interesting opportunities for targeted research in the area of conversational agents for collaborative learning.