In order to involve seniors in the definition, development and consequent optimization of the EMPATHIC-VC it was necessary to employ various early stage prototyping methods (e.g. use case descriptions, sketches, scenarios, etc.). One of the used methods, which is particularly popular when building technology based on natural language (Schlögl et al. 2015) or other types of artificial intelligence driven applications (Dahlbäck et al. 1993), was Wizard of Oz (WOZ). The key principle of the WOZ method is that study participants believe they are interacting with an autonomous system while actually the system’s actions are controlled by a human (i.e. the ‘wizard’). In most cases this wizard is situated in a different room and connected to the study setting through a remote network connection. Consequently, WOZ sessions require a minimum of two researchers, i.e. the wizard controlling the technology and an additional facilitator dealing with all the participant related tasks (i.e. welcoming, informed consent, questionnaires, debriefing, etc.). For the EMPATHIC simulated VC both of these researchers received relevant training to prepare them for their tasks. The facilitator had to follow a strict procedural protocol when receiving participants and administrating questionnaires (cf. Sect. 3). The wizard received dedicated training concerning the used WOZ platform (cf. Sect. 2.1) as well as the dialogue structure which had to be followed.
The Wizard of Oz platform
Since decisions on the overall architecture of a virtual agent based application, such as the one envisioned by the EMPATHIC-VC, usually require extensive discussions, it was decided to use WebWOZFootnote 2 (Schlögl et al. 2010a) as separate WOZ prototyping platform for early stage investigations. WebWOZ, which has been previously used by a number of research and development initiatives (e.g. Cabral et al. 2012; Milhorat et al. 2013; Sansen et al. 2016), offers an adjustable wizard interface which can be structured according to different dialogue stages (Schlögl et al. 2010b, 2011). For simulating interactions with the EMPATHIC-VC, the WebWOZ wizard interface was further extended by an audio/video transmission and recording function based on the WebRTC standard, a graphical representation of the dialogue to help guide the wizard, and the possibility to upload and consequently integrate text-based utterances. In addition, the WebWOZ client interface was integrated with five different virtual agents, which allowed participants to select their preferred interaction partner (Torres et al. 2019b).
The coaching scenarios implementing the GROW model
Coaching has been defined as a result-orientated systematic process. It generally uses strong questions in order to provide people the capacity of discovering their own abilities and draw on their own resources. In other words, the role of a coach is to foster change by facilitating a coaches’ movement through a self-regulatory cycle (Grant 2003). One of the most common used coaching methodologies is the GROW Model (Whitemore 2009). This model provides a simple methodology and an adaptable structure for coaching sessions. Moreover, efficiency has been demonstrated in some Theoretical Behavior Change Models such as the Trans theoretical Model of Change (TTM) (Passmore 2011, 2012).
A GROW coaching dialogue consists of four phases which give the name to the model: Goals or objectives, Reality, Options and Will or action plan. During the first phase (Goal), the interaction aims at getting the specification of the objective that the user wants to achieve, for example, to reduce the amount of salt in order to diminish the related risk of hypertension. Then, this goal has to be placed within the personal context in which the user lives (Reality), and the potential obstacles which needs to be identified. In the next phase (Options), the agent’s goal is to incite the user to analyze his/her options in achieving the objective within his/her reality. Then the final goal of the interaction is the specification of an action plan that the user will carry out in order to advance towards goals (Will). The EMPATHIC-VC is planned to deal with four coaching sub-domains: nutrition (Sayas 2018b), physical activity (Sayas 2018c), leisure (Sayas 2018a) and social and family engagement. A professional coach provided a set of handcrafted coaching sessions for each of these sub-domains. The GROW model uses Goal Set Questions (GSQ—e.g.“Welcome Jorge, how can I help you?”) to define the objective of the user, Motivational Questions (MQ—e.g.“What would you achieve if you changed the way you eat?”) to look for some sort of motivation which may help him/her achieve a set goal, Reality/Resources Questions (RQ—e.g.“And what happens when you just eat bits?”) to analyze the current situation of the user and establish resources, Obstacle Questions (OQ—e.g.“And if you are out, what are you going to snack on?”) to determine obstacles in the accomplishment of the goal, Option Generation Questions (OGQ—e.g.“What small step could you take that would get you closer to your milestone of having meals planned?”) to define possible actions a user has to perform in order to achieve the goal, Plan Action Questions (PAQ—e.g.“What are you going to do to achieve your goal of adopting a more regular eating pattern?”) to establish an action plan, Following Questions (FQ—e.g.“How has your plan gone concerning the timing of your meals?”) to ask a user about an ongoing plan, and Warning Questions (WQ—e.g.“What is your blood pressure like?”) to know if the user has any (other) health problems which may need to be considered. The GROW model structure is shown in Fig. 1.
An example of such a handcrafted session is shown in Fig. 2. Then, the wizard strategy was designed according to two different scenarios based on the conversations and indications provided by the professional coach. However, the Wizard had to develop and add new strategies to deal with real user interactions. Thus, a specific wizard profile was created defining a system behavior.
The user-centered iterative design for elders’ virtual agent acceptance
While aiming at implementing a virtual coach devoted to assist the elderly population in their independent living, the goal was to abandon the human–machine interaction techno-centric paradigm and focus on the needs and intentions of the relevant elder end-users, their abilities, aptitudes, preferences, and desires. As for its implementation, the EMPATHIC-VC had as initial requirements accessibility and usability by a wide variety of elderly users, ranging from field experts, practitioners, persons with different knowledge (culture, instruction and occupations), needs (impaired and communicatively disordered individuals) age, and preferences.
To this aims, we have taken a user-centered iterative design, assessing users’ interactions in context so that (a) trustworthy human–agent relationships are build, (b) emotional states and negative moods such as depression are reliably detected (Buendia and Devillers 2014; Cavanagh and Millings 2013; DeSteno et al. 2012; Parker and Hawley 2013), and (c) appropriate advice on actions is provided. This was built upon several theoretical experiments, to collect a substantial quantity of data assessing seniors’ willingness and interest in initiating and retaining conversations with an agent upon different qualitative agent features (such as gender and voice) in comparison to differently aged populations such as adults and adolescents.
With this research, we have acquired a deeper understanding of how to design emotionally-aware interactive agents that exhibit coherent visual, vocal and gestural affordances, and adapt to the user’s underlying intentional and emotional states in a cooperative and ethically sound manner. All the executed experiments were driven by the key idea that any intelligent social ICT interface should be capable of establishing an empathic relationship; hence the emphasis of the investigations was on mood enhancement linked to use-cases in e-mental health and support for older/vulnerable people.
The rich repertoire of theoretical results acquired is summarized below, in particular for agent’s gender and voice.
A first pilot experiment, focusing on user requirements and expectations with respect to participants’ age and familiarity with technological devices (such as smartphones, laptops, and tablets) showed that, as for gender, elders prefer to be assisted by female agents (Esposito et al. 2018b). In this context, an ad-hoc questionnaire was developed to assess senior’s preferences, expectations and requirements, in order to customize the consequently developed EMPATHIC-VC to the needs of the targeted end-user population, i.e. elders.
It has to be noted that starting with this pilot, the questionnaire has been gradually modified, in an attempt to incorporate the Theory of Acceptance Model (TAM) proposed by Davis (1989) and the pragmatic and hedonic dimensions proposed by Hassenzahl (2004). The result has been given the name Virtual Agent’s Acceptance Questionnaire (VAAQ) and may count as a direct outcome of the Empathic project.
For the above mentioned pilot investigation using an early version of the VAAQ it was further learned that seniors’ preference for female agents was significantly higher than for male agents for all the questionnaire dimensions, independently of seniors’ genders and technology savviness.
In order to remove the biases introduced by differences in agent’s personalities, a second set of experiments was conducted (Esposito et al. 2018a). In these trials, the four proposed agents (two males and two females) were endowed of a “neutral” personality, and their facial expressions were neither smiling, saddening, nor worrying. This test definitively confirmed seniors’ preferences to be assisted by female agents which scored significantly better than male agents in all the questionnaire subsections.
In order to assess whether seniors’ preferences toward female speaking agents were a specific requirement of the elder population, we defined another set of tests involving adolescents, adults, and seniors for a total of 316 participants split in 7 groups, each composed of approximately 45 subjects, equally balanced for gender (Esposito et al. 2019a). There were two groups of adolescents (mean age = 14.5, SD = ± 0.5 years), two of adults (mean age = 25.1, SD = ± 3.5 years), and two of seniors (mean age = 71.4, SD = ± 6.5 years). It was found that elders’ willingness to interact was significantly higher for speaking than mute agents, and, in the speaking context, it was significantly higher for female speaking than male speaking agents. In addition, for elders in the speaking context, female agents were judged significantly more positive than male agents for attractiveness, pragmatic, and hedonic (identity and feeling) qualities. None of these significant differences was observed for adolescents and adults administered with mute and speaking agents and elders administered with mute agents.
When the three elder groups were compared on their enjoyment/acceptance scores for mute, speaking and only voice interfaces, elders’ preferences were significantly higher for female speaking agents and only female voice interfaces.
The discussed experiments suggest that the successful incorporation of assistive social technologies in everyday life is strongly depending on the user’s perception and acceptance of them (de Graaf et al. 2015). In particular, robots, virtual agents, and generally, interactive assistive user interfaces, need to be specifically tailored to people’s needs, and personalized according to their specific requirements and expectations (Seiki et al. 2017),