Keywords

1 Introduction

Eating too few vegetables, working out too little, going to bed too late, ... - there are many health behaviors people wish to change. Numerous eHealth applications exist to support them, which have several potential benefits such as fostering user empowerment, providing all-time support, and being scalable [10, 29]. Behavior change support could thus become more widely available and effective. This is useful for behaviors such as smoking, where most people try to change without support [13] and often do not succeed [18]. Nevertheless, eHealth applications suffer from low adherence [22, 27]. Virtual coaches have the potential to combat this by increasing engagement and forming a connection with users [34]. However, it is not yet well understood how such virtual coaches can be designed so that users have a positive attitude toward them.

Previous work on smoking cessation has identified several aspects of a virtual coach that contribute to such a positive attitude. This includes the caring character of and positive feedback from an embodied virtual coach for veterans [1], practical tips and the feeling of being supported by the StopCoach Suzanne [33], and the nonjudgmental, supportive, and caring character of the embodied virtual coach Jen [25]. Yet, not only characteristics of the virtual coach play a role, but also ones of users and their environment [21]. Meijer et al. [33], for example, saw that the StopCoach may be harder to use for older people and that its support may be especially beneficial for people with little support from their social environment. Given the multitude of factors that can affect users’ attitudes toward a virtual coach, more insights are needed. Due to the novelty effect [19, 37], insights that are based on users interacting with a virtual coach for a longer period of time are especially welcome.

We thus conducted a study with more than 500 smokers interacting with the virtual coach Sam in five conversational sessions spread over at least nine days. In each session, Sam assigned users a new preparatory activity for quitting smoking, such as noting and ranking reasons for quitting. As becoming more physically active may make it easier to quit smoking [23, 35], half of the activities addressed becoming more physically active. In the next session, Sam asked users about their effort spent on and experience with the activity. After the five sessions, users were asked about their relationship and willingness to continue working with Sam. Based on a mixed-methods analysis of users’ responses about their relationship and willingness to continue working with Sam, their characteristics, and findings from the literature, we identified eight themes describing users’ attitudes toward Sam. We used these themes to formulate literature-based recommendations to guide designers of virtual coaches for behavior change.

2 Materials and Methods

We conducted a study from 20 May 2021 until 30 June 2021 on the online crowdsourcing platform Prolific. The study was approved by the Human Research Ethics Committee of Delft University of Technology (Letter of Approval number: 1523) and preregistered in the Open Science Framework (OSF) [3]. The dataset [5] and analysis code [4] are available online.

Virtual coach. We implemented the text-based virtual coach Sam [2] that introduced itself as wanting to help people prepare to quit smoking and become more physically active, with the latter possibly aiding the former. In each session, Sam proposed one of 24 preparatory activities and provided motivational support for doing the activity [6]. This included motivating people to do their next activity based on the persuasive strategies of commitment, consensus, and authority by Cialdini [16] and action planning [24], as well as giving compliments for spending much effort on activities and responding empathetically otherwise. To facilitate the interaction, users primarily communicated by clicking on buttons with answer options. To avoid repetitiveness of utterances [14], there were different formulations that Sam randomly chose from.

Measures. Instead of asking participants about their general attitude toward Sam, we asked two specific questions that would allow people to reflect on concrete elements of interacting with Sam and allow us to identify underlying concerns. Specifically, using an adaptation of the acceptance questions by Provoost et al. [36] similar to Albers et al. [8], we measured participants’ willingness to continue working and their relationship with Sam. For both of these, participants provided a rating on a scale from -5 to 5, with 0 being neutral, as well as a free-text response to the question “Why do you think so?”

To explore the relationship between user variables and attitudes toward Sam, we measured participants’ quitter self-identity with three items (e.g., “I see myself as someone who quits smoking”) based on Meijer et al. [32] as well as participants’ ease of and motivation to do preparatory activities with two items each (e.g., “It was easy to do the assigned activities.”).

Participants. Eligible were people who reported smoking tobacco products at least once per dayFootnote 1, being contemplating or preparing to quit smoking [20], being fluent in English, and not being part of another intervention to quit smoking. Participants further had an approval rate of at least 90% and at least one previous submission on Prolific and provided informed consent. 1406 people started the study, and 500 people successfully answered all attitude questions. Of these 500 participants, 247 were female (49.4%), 244 were male (48.8%), and 9 (1.8%) provided other or no data on their gender. Moreover, 397 (79.4%) participants indicated having previously at least once quit smoking for at least 24 h. The age of participants ranged from 18 to 74, with 43.8% of participants being between 18 and 30 years old. Participants who successfully completed a study component were paid based on the minimum payment rules on Prolific (i.e., five pound sterling per hour).

Procedure. After completing a prescreening and a pre-questionnaire, users interacted with Sam in five sessions, which lasted about five to eight minutes and were about two to five days apart. Two days after completing the last session, users were invited to a post-questionnaire in which they answered questions about their attitudes toward Sam.

Analysis strategies. We took a mixed-methods approach with four steps to analyze the data. These were the thematic analysis steps by Braun and Clarke [15], supplemented with triangulation based on quantitative data and literature. We now describe the four analysis steps in detail.

Step 1: Preparation of coding scheme. We created two separate coding schemes, one for the question on “willingness to continue” and one for the question on “relationship”. More precisely, after familiarizing themselves with the data, NLA and MA created draft coding schemes for the free-text responses on “willingness to continue” and “relationship”, respectively. The codes were generated both inductively and deductively based on literature. More information on this process can be found in the reports by Aretz [11] and Ali [9]. The final coding schemes are available online [4].

Step 2: Coding of responses. Using the resulting coding schemes, NLA and MA coded all responses for their respective question. We assessed the reliability of the coding schemes through double coding. A second coder, AE for “willingness to continue” and NLA for “relationship”, was trained on between 20 and 30 responses before independently coding the remaining responses. We obtained an average Cohen’s \(\kappa \) of 0.59 for “willingness to continue” and 0.62 for “relationship”, which indicates moderate to substantial agreement [28]. The two coders of a question then resolved all disagreements by means of discussion.

Step 3: Triangulation with literature and quantitative results. As we gained insights from the qualitative data, we turned to the literature and our quantitative data to triangulate our results. Our quantitative analysis included calculating the mean rating and 95% Highest Density Interval (HDI) per attitude question, as well as computing correlations between user variables and attitude ratings. We used the rethinking [31] and BayesianFirstAid [12] R-packages for this.

Step 4: Search, review, and definition of themes, as well as production of the report. Lastly, we examined the results to identify overarching themes and selected participant responses to illustrate these themes.

3 Results

Participants’ attitudes toward Sam were overall positive, with a mean score of 2.42 (95%-HDI = [2.18, 2.65]) for “willingness to continue" and one of 0.46 (95%-HDI = [0.24, 0.67]) for “relationship” based on scales from −5 to 5. From our mixed-methods analysis we found eight themes, three for people’s willingness to continue working and five for their relationship with Sam. To facilitate their discussion, we grouped these themes into the four topics that we now describe.

Human vs. AI. When describing their willingness to continue working with Sam, several participants referred to a lack of authenticity in their interactions with Sam (N = 82). This included the mere fact that Sam was artificial and not a human (e.g., P354, P378, P448), as well as specific consequences such as repetitiveness (e.g., P310, P410), lack of emotion (e.g., P341), and inability to respond to free-text responses from users (e.g., P465). Some participants also mentioned how they felt when interacting with an artificial agent: “Felt silly talking to a computer plus it did not help me because I knew it was not real" (P441).

With regard to participants’ relationships, the perception of Sam as either human or artificial also played a role (N = 325). While some participants referred to Sam as a “he”, “guy”, “someone”, somebody with a “character”, or even a “friend” (e.g., P185, P196, P214), others regarded Sam as artificial. Terms used to refer to Sam’s artificial nature included “machine” (e.g., P223), “computer” (e.g., P218), “agent” (e.g., P197), “artificial intelligence” (P62), “communication tool” (e.g., P234) and “bot” (e.g., P217): “I found him a nice “person” but also I think about the fact it’s not a real person, it is artificial intelligence working” (P62). Interestingly, one participant mentioned that Sam asking them about their mood at the start of each conversation was pleasant even though they knew that Sam was artificial: “I was well aware that I wasn’t talking to a real person ... However, I still appreciated the effort in asking me about my day and so on, it made the whole thing feel more realistic and the conversation more pleasant” (P218). Something participants appreciated about Sam’s artificial nature was the resulting anonymity: “i loved the anonymity of it, that no one i know would know what i’d said so i could be totally open and honest ...” (P227).

Characteristics of the virtual coach. The second theme in participants’ responses about their willingness to continue working with Sam is Sam’s caring character (N = 352). Participants described Sam as motivating (e.g., P478, P485, P494), friendly (e.g., P47, P98, P107), comforting (e.g., P202, P227, P345), supporting (e.g., P65, P158, P370), understanding (e.g., P198, P227, P253), and unbiased (e.g., P38, P289, P311): “... being in contact with Sam gave a feeling of support it helped a great deal towards the motivation to carry on” (P161).

Also regarding their relationship with Sam did participants refer to Sam’s positive characteristics (N = 153). Sam was seen as friendly (e.g., P297, P301), guiding (e.g., P336, P356), trustworthy (e.g., P201, P227, P453), warm (e.g., P68, P241, P341), caring (e.g., P73, P83, P140), welcoming (e.g., P120, P272, P380), and unprejudiced (e.g., P158, P201, P226), amongst others: “Friendly and just nice. You have a problem, Sam suggests activities to help you overcome it” (P403).

Interaction. The third theme concerning participants’ willingness to continue working with Sam is the content of the interactions (N = 183). Participants found the interactions helpful (e.g., P178, P189, P191) and interesting (e.g., P228, P291, P303). Moreover, some participants specifically mentioned that Sam proposed good ideas or activities (e.g., P336, P405, P414) and provided clear explanations: “Sam explains clearly its minds, the activities and the final goals ...” (P256). The relationship between the activities Sam proposed and participants’ willingness to continue is also confirmed by our quantitative analysis. Specifically, we found moderate correlations of 0.33 (95%-HDI = [0.25, 0.41]) and 0.48 (95%-HDI = [0.40, 0.54]), respectively, between participants’ willingness to continue and their ease of and motivation to do the activities. The less than small [17] correlation between quitter self-identity and “willingness to continue” (Mean = 0.09, 95%-HDI = [0, 0.19]) could possibly be explained by the interaction content being seen as somewhat more useful by participants with a higher motivation to change.

Similarly, participants referred to characteristics of the chat when describing their relationship with Sam (N = 83). Participants with a positive opinion about the chat mentioned that Sam responded well (e.g., P76, P438), such as by being polite (e.g., P472) or showing an understanding of what the user wrote (e.g., P70). Several participants also pointed out that they felt like chatting with a friend (e.g., P121, P484): “It felt like talking with a friend, since at first he asks how you are, it feels like he cares about you” (P130). Several participants did, however, also voice objections to the chat. For example, some participants noted that the answer options provided as buttons were too limited (e.g., P320, P434), the options’ language did not fit how they speak (P400), and the use of buttons was limiting compared to entering free text (e.g., P370, P398): “Having the pre selected options to answer Sam meant that I couldn’t really respond as I would really ...” (P63). Furthermore, Sam’s responses appeared automated and generic to some participants (e.g., P327, P334, P444). And looking at conversations as a whole, these were sometimes perceived as repetitive or monotonous (e.g., P28, P162): “... all our chats followed the same pattern ...” (P219).

Relationship. Participants’ descriptions of their relations with Sam can be regarded as either positive, neutral, or negative (N = 194). Positive relations included ones where Sam was referred to as a “good relation” (P35), “friend” (P233), “more than just a stranger”(P287), or somebody they felt like knowing for a long time (P232). Participants with neutral or negative relations, on the other hand, regarded Sam as a “professional worker” (P186), “neither a stranger or close friend” (P340), or “not a close friend” (P40). Moreover, some participants pointed out that it was not necessary to be friends with Sam (P48) and that it was not possible to “create a relation with a robot” (P277).

Another theme related to participants’ relationships with Sam is that Sam was sometimes experienced as not personal enough or lacking personality (N = 53). Participants mentioned that they had “no personal connection” with Sam (P470) or did not “know Sam personally” (P338). Reasons for this feeling of not knowing Sam were that Sam “shows no emotion nor can show empathy” (P160), that they did not “know his voice or how he looks” (P448), and that Sam is a “non-thinking individual” (P326) with a different “level of consciousness” (P489). Furthermore, participants pointed out that Sam did not know them (e.g., P12, P15): “I didn’t feel like it was someone who knew me or wanted to get to know me and I think it would be good to develop something a little more personalised” (P20).

4 Conclusion and Recommendations

Based on a mixed-methods analysis of users’ willingness to continue working and their relationship with a virtual coach that provides motivational support in the context of preparing to quit smoking, we identified eight themes describing users’ underlying attitudes. We now use these themes to formulate literature-based recommendations to help designers of virtual behavior change coaches.

Given participants’ views on interacting with Sam, we recommend reducing the repetitiveness of the utterances and the conversation structure, which can increase engagement, enjoyment, and motivation to engage in proposed behaviors [14, 19]. Comprehensive answer options should further be provided for closed questions, as also found by Issom et al. [26]. Moreover, in line with theories on technology acceptance [39] and our previous findings based on people’s experiences with preparatory activities and views on interaction scenarios for a virtual coach [7], participants who found the content more useful were more willing to continue using the virtual coach. So it is important that interaction content is perceived as useful by users.

To help users feel like they and the virtual coach know each other, it might be beneficial to let the virtual coach disclose more about itself and ask more personal questions to users so they can also self-disclose. Self-disclosure is an important element of relationship formation in that disclosure of more intimate information has a positive effect on the quality of relationships [38]. Yet, user privacy needs to be protected [40]. Responding empathetically also to free-text utterances may further help to form and maintain a relationship [14, 19, 26].

Lastly, given participants’ perception of Sam as either human or artificial, it may be helpful to make the conversations more human-like, although human skills such as learning from past conversations are still open challenges [19]. For example, asking users how they were doing was appreciated. Yet, users need to know that they are communicating with a virtual coach and not a human [30]. Explaining to users that and how the virtual coach is aiming to build a relationship to increase the intervention’s effectiveness may further foster credibility and transparency.