The experiment was conducted at a German University between 28 May and 18 June 2019. The university’s students were invited to participate on a voluntary basis. In this timespan, 91 people participated in the study. We then randomly assigned the participants into two groups, resulting in a well-balanced sample of 46 participants in the control group without a virtual assistant and 45 in the experimental group using a virtual assistant. Overall, 54.9% of the participants were female (N = 50), and their age ranged from 18 to 31 (M = 22.01, SD = 3.02), indicating a rather young sample. Furthermore, 80% of the participants had passed their A-levels while 14% held a Bachelor's degree. Together with the young age and in accordance with the mode of acquisition of the sample this shows a typical undergraduate student sample.
NASA task load index (NASA-TLX)
Concepts related to cognitive load are frequently measured using self-report rating scales (Paas et al. 2003). This approach assumes that learners are able to report the amount of mental effort that they experienced while attempting to solve a task. It is worth noting that self-report rating scales do not typically distinguish between the three types of cognitive load (intrinsic, extraneous, germane) but rather measure the overall load experienced.
A commonly used scale to quantify the perceived workload of a participant is the NASA Task Load Index (Galy et al. 2012). The National Aeronautics and Space Administration (NASA) developed the NASA-TLX in order to measure the perceived workload of a task (Hart and Staveland 1988). This measurement was successfully used in several contexts such as in both laboratory and field studies (Rubio et al. 2004; Noyes and Bruneau 2007; Cao et al. 2009). The index itself contains six subjective subscales forming the NASA-TLX score: (1) Mental Demand, (2) Physical Demand, (3) Temporal Demand, (4) Performance, (5) Effort, and (6) Frustration. These clusters of variables were chosen to cover the “range of opinions and apply the same label to very different aspects of their experience” (Hart 2006, p. 904). Due to the subjective experience of conducting a specific task, the NASA-TLX was developed to consider the perception of a variety of activities such as simple laboratory task or flying an aircraft. While (1) describes how much mental and perceptual activity was required, (2) shows the perceived amount of required physical activity. Besides the perceived mental and physical efforts, the NASA-TLX also covers the perception of time pressure (3) during a task. Furthermore, the subscales (4) to (6) ask about the perception of the results of the given tasks. Therefore, (4) describes the personal performance perception – i. e. the perceived success reaching the given goals of the tasks and (5) asks to what extent the participants had to work to reach the achieved level of performance. As people sometimes feel frustrated when a given task is perceived as too difficult, subscale (6) asks the participants about the level of frustration during the task (Hart 2006). In our experiment, all subscales had high reliability (Cronbach’s α = 0.89).
Resilience scale (RS-11)
According to the appraisal theory, stress emerges when a task at hand exceeds one’s own resources and abilities (Smith et al. 2011). Following, an increasing level of stress might impact the participant’s task performance as well as the perception of the work and its outcome. In order to avoid undetected distortion towards the task performance, we consider the psychological resistance to stress or difficult situations, known as resilience (Neyer and Asendorpf 2017). We use the Resilience Scale (RS-11) as a short scale for assessing the resilience of a human (Schumacher et al. 2005). The RS-11 is a self-report scale containing eleven items which are divided into two sub-scales: (1) personal competence and (2) acceptance of the self and life. The subscales had a high reliability, all Cronbach’s α = 0.90.
In order to investigate the impact of a text-based VA on decreasing the cognitive load during task-solving, we made use of Google’s cloud platform DialogFlow.Footnote 1 This platform is widely used for developing natural and rich conversational experiences based on Google’s machine learning (Canonico and De Russis 2018). Furthermore, the implementation is based on four general concepts (Muñoz et al. 2018). First, Agents transform natural user language into actionable data when a user input matches one of the intents. Second, Intents represent a mapping between what the user says and what action is taken. Third, Entities represent concepts and serve as a tool for extracting parameter values from natural language inputs. Finally, Contexts are designed for passing on information from previous conversations or external sources. To reduce the degree of complexity caused by the interaction with the VA, we focused on establishing a disembodied VA with a messaging-based interface (Araujo 2018).
As VAs exhibit social and conversational dialogue (Hung et al. 2009), our VA is implemented to make a simple conversation at the beginning of the interaction. Participants can interact with the VA via a web-based interface, similar to contemporary instant messengers such as Telegram or WhatsApp, using a keyboard and a computer screen. This interaction could be a request for the participant’s name and feelings. Furthermore, the applied VA is text-based to avoid additional influential factors which may evolve by voice interactions or embodied avatars. Figure 1 shows a translated example of a dialogue with the VA.
To support the participants during the task, the assistant simulates intelligence by selecting a prefabricated answer based on distinct keywords used in the participant’s input. We defined 25 Intents to match the user input. The intents belonged into roughly 3 groups: Introduction, Tutorial and Task Support. The Intents in the Introduction Group mostly revolved around welcoming the users, asking for their well-being and readiness to start the task. The tutorial intents were designed to increase the users’ familiarity with the VA and the capabilities of the VA. Most of the intents revolved around Task support where users could ask for help solving the task, for example by asking what certain parameters meant or how they were calculated. We also used the standard „sys.given-name “ entity provided by DialogFlow as an Entity. The VA’s feedback includes a question-answering component (Morrissey and Kirakowski 2013; Lamontagne et al. 2014) that can be queried by the user to gain information, support and instruction about the specific task. In this context, the VA only provides helpful hints which support the participants solving the task. However, the VA does not deliver the actual solution to the current task.
Task performance and pre-study
Task performance was measured with a score ranging from 0 to 28 that captures how well participants did at a critical path method (CPM) task. A higher value represents a better performance in the execution of the task. The goal of this task was to use this method to plan a research project for the market research unit of a large organisation.
The task was determined in a pre-study to ensure that it is sophisticated and involves a potentially high perceived workload in the experiment. The sample of 10 participants (6 female, 4 male) consists of randomly selected students at the University. In this context, a good fitting task challenges the participants on decent level, and therefore causes an increased cognitive load score. A task which overwhelms the participants may prevent sustained learning effects due to less available cognitive resources (Paas et al. 2003). To this end, a text-based task (TBT) and CPM were compared. On the one hand, the TBT required the participants to read three texts about medieval ages, a topic which does not rely on previous knowledge of the participants. On the other hand, the CPM was implemented with a scenario that puts the participants in a working context. In detail, the participants had to organise a marketing study using the CPM. The time limit for each of the tasks was 10 min.
Each task was given to five participants and the perceived workload was measured by the NASA-TLX. The age ranged from 22 to 31 (M = 25). On average, participants given the CPM task engaged in higher NASA-TLX scores (M = 12.5, SD = 3.85) than the TBT group (M = 6.36, SD = 4.06). This difference of 6.13 was significant (95% CI [0.35, 11.91], t(8) = 2.44, p = 0.040. Furthermore, it represents a large-sized effect, d = 0.98. Following, the CPM task has the potential to increase the cognitive load of the participants in a more effective way than the TBT does. Thus, due to its better potential to benefit from the use of a virtual assistant, the CPM was chosen for the main study.
In order to investigate the influence of a VA on the perceived workload of a participant, the experiment used a between-subjects design. The independent variables were the resilience score (RS-11) and the usage of a VA (group variable) whereas the dependent variables were the perceived workload (NASA-TLX), the task-score as well as the time to finish the task. Analyses were conducted using the software SPSS Statistics (Version 25) and Jamovi (22.214.171.124).
The main study was conducted as a laboratory experiment at a German University in German language. A laboratory experiment was chosen to better control the surroundings, to ensure that the task performance was measured correctly and to ensure a steady and even experience with the virtual assistant. Furthermore, the investigators were present to assist the participants with questions should those arise. However, their assistance was not utilised by any of the subjects.
The participants were welcomed by the investigator and introduced to the study. They were then led to a computer to begin with the first questionnaire. First, the participants were presented with the RS-11 questionnaire to retrieve the resilience score.
Afterwards they were presented with an introduction to the CPM followed by an example. After reading through the briefing, participants were instructed to contact the investigator for the material needed. The goal was to use the CPM to plan a research project for the market research unit of a big organisation. Participants were given a list with unordered process steps (such as "literature research", "conducting the study" or "develop methodology"), the respective duration for each step as well as its dependencies on the other steps in the process. They were also handed an empty template for a CPM to fill out with the according parameters. Finally, the participants were informed of a virtual folder they were allowed to use which was located on the laboratory computer and included unordered text files explaining the CPM procedure and the calculation of the individual values.
Additionally, the participants in the experimental group were also presented with a sheet of paper which explained that they were allowed to use a text-based VA and that it was nested in a browser window in the computer. They were then explained how to use the VA properly such as using single sentences and that the VA did not have contextual knowledge. All subjects in the experimental group made use of the VA which provided the participants with the same information available in the folder to all groups but could be specifically asked for certain information, e.g. what certain parameters stood for or how they were calculated. Figure 2 depicts the steps a conversation with the VA consisted of. Except for the availability of the VA, the participants in the control and experimental group were presented the exact same task. Also, all participants had access to the same information for solving the task with the only difference that subjects in the control group could access the data via browsing through virtual folders on the computer whereas subjects in the experimental group could specifically access the information via dialogue with the VA.
Participants then had a time limit of 10 min to complete the task after which they had to stop solving the task even when they had not yet completed it. They were also instructed to give notice should they be finished before the time limit had run out. In the latter cases the investigator noted the time that was needed. After the participants either completed the task or the time ran out, they were re-referred to the computer to complete the remainder of the survey.
Following the task, the participants were presented with the NASA-TLX to assess their perceived workload immediately after solving the task. They were then asked whether they had already been familiar with the technique of CPM and the participants in the experimental condition were additionally asked whether they thought the support by VA was helpful.
Finally, all participants were asked for their gender, age, highest educational attainment and were debriefed, asked whether they had any further questions and then thanked for their time.Footnote 2