To test our hypothesis of the effects of mimicry and social praise on perceived intelligence and perceived friendliness, we set up an experiment in which subjects were asked to chat with a chat-robot for a maximum of 10 min. The chat-robot—named Sara—displayed praise, mimicry, both, or none of these. In this section, we describe the experimental setup in more detail.
Participants
Fifty Dutch college undergraduates took part in our experiment (27 males and 23 females). Participants received €5.-in gift coupons for their participation, which is the standard fee used at the Technical University of Eindhoven where the study was conducted. Participants were randomly assigned to one of the conditions of a 2 × 2 (No mimicry/mimicry × No praise/praise) between-subjects factorial design. Participants all indicated to be fluent in English—the language used by the chat-robot—and indicated moderate to high computer literacy. The average age of participants was 23.8 years (SD = 5.09). A between-subjects design was chosen since we expected large order effects and increased fatigue—and thus unreliable data later in the study—in the case of a within-subjects design.
Procedure
Prior to running the experiment, a pilot study was conducted with five pilot participants. This was done to test our implementations of the conditions and to make sure all questions were easily understood. Both the pilot study as well as the experiment were run at the Psychology lab at the Technical University of Eindhoven, the Netherlands. The Psychology lab is a laboratory with 10 sound isolated cubicles where participants can work individually using a PC. Participants were assigned to one of the cubicles and followed the on-screen instructions that guided them through the study.
The first screen presented to participants was the informed consent form. Participants were told they were participating in a study to evaluate the implementation of a chat-robot named Sara. Participants were thus aware that they would be conversing with an artificial agent and not with a human.
Participants were not informed about the different mimicry and praise conditions. Participants were notified that their participation was voluntary and they could stop anytime they liked. Furthermore, we textually explained that the data gathered would be used for scientific purposes only.
After obtaining informed consent, the textual instructions introduced Sara. Participants were told that they had a maximum of 10 min to converse with Sara. Sara was introduced as a newly developed chat-robot that was skilled in discussing a number of topics, namely Sport, Geography, Politics and Artificial intelligence. Participants were also told that their conversation would end automatically after 10 min; however, they could end their conversation whenever they wished by clicking a button. During the conversation, a timer displayed the remaining conversation time and after 10 min participants automatically advanced to the next instruction section.
After the conversation, participants were asked to fill out a number of questionnaires. The exact questions asked are described in the materials section. Finally, participants were notified that the experiment was over and were asked to leave the cubicle and notify the experimenter. After completion, participants were debriefed and received the €5.-reward. They were instructed not to discuss the experiment with their classmates or friends.
Mimicry
Participants were randomly assigned to either the mimicry or no-mimicry condition. In our experimental setting, mimicry was operationalized in the following way:
-
In the no-mimicry condition Sara responded almost instantaneously (response times were shorter than 0.5 s) to any remark made by the participant.
-
In the mimicry condition, we recorded the time from the first keystroke of the participant until the “Enter” button was pressed or the “Send reply” button was clicked. The reply was then delayed by the same time.
We reckoned our mimicry implementation would capture participant’s response time excluding their reading time, which would heavily depend on the complexity and length of Sara’s responses.
Praise
The positive social feedback or praise conditions were implemented as follows:
-
In the no-praise condition, participants conversed with Sara as implemented by an AJAX extension of the Program E php / ALICE implementation—see Materials.
-
In the praise condition, we presented a positive feedback message every ten request-response cycles.
The number ten was chosen since this was not overwhelming in the conversation but would still occur at least two times within every conversation with Sara as shown in our pilot study. The feedback that was presented was a random selection of one of the following sentences:
-
1.
I really like our conversation a lot.
-
2.
You are a very nice person to talk to.
-
3.
Our conversation is very pleasurable. Thanks for talking to me!
-
4.
You are such a kind person!
-
5.
I really like talking to you.
These sentences were presented embedded in Sara’s answers right before the actual response from Sara.
In our pilot, we discussed the implementation of the praise condition with our participants who remarked not to feel disturbed by the remarks. To further check for suspicion of deceit or expectancy created by the embedded remarks, we added the open question “What do you believe is the goal of this experiment?” in one of the proceeding questionnaires. This check was built in to prevent participant biases because of prior expectations. None of the participants remarked anything about social feedback or praise, and thus, we are convinced that the remarks felt natural, given the human-chat-robot conversation: they did not disclose the experimental manipulations.
Materials
In this experiment, we used the Program E implementation of A.L.I.C.E. (Wallace 2009), which is an AIML—Artificial Intelligence Markup Language—interpreter.
Footnote 1 While implemented in PHP we extended the session management of the standard program E installation to enable an AJAX approach to manage the discussion. The front end of the application was done in HTML, CSS and JavaScript. This approach enabled us to implement the mimicking and praise conditions on the client site using JavaScript. We ran a standard installation of program E with a number of AIML libraries relating to the topics Sport, Geography, Politics and Artificial intelligence. As mentioned before the time from the AJAX HTTP-request from the client to the PHP server sending the response, and for this response to be rendered to the participant, never took more than 0.5 s in the no-mimic condition.
Questionnaires
The questionnaires presented to participants after the conversation assessed the following:
-
1.
The perceived friendliness of Sara.
-
2.
The perceived intelligence of Sara.
-
3.
Participants perceived connectedness to Sara.
-
4.
Remarks on the conversation.
-
5.
Additional measures.
We describe each of these in more detail.
Perceived friendliness
Given the aim of the experiment, the first questions after the conversation with Sara concerned the perceived friendliness of Sara. Participants were asked to grade Sara’s friendliness on a scale from 1 (very unfriendly) to 10 (very friendly). This 10-point scale corresponds to the Dutch high school grading system and as such is very natural for most of our Participants. Next to this grade, participants also filled out five items regarding Sara’s friendliness on a scale from 1 (Totally disagree) to 7 (Totally agree). We chose to implement these two ways of measuring friendliness to improve the construct validity of our measure. If the obtained results are equal across both methods of measurement our confidence in the obtained results as a reflection of actual perceived friendliness is increased.
All participants rated their agreement to the following items:
-
1.
Sara was friendly during our conversation
-
2.
Compared to humans Sara’s interaction style was unfriendly
-
3.
If Sara was a real person I would consider her friendly
-
4.
Compared to humans Sara was polite
-
5.
I really liked Sara
To compute a final friendliness, index item 2 was reversed and an average of the 5 items was computed for each participant (Cronbach’s α = 0.762).
Perceived intelligence
For perceived intelligence, we used a similar approach as perceived friendliness. First, participants were asked to grade Sara’s intelligence on a 10-point scale. Second, we presented the following items (7 point scale):
-
1.
Sara was intelligent
-
2.
Compared to humans Sara seemed dumb
-
3.
If Sara was a real person I would consider her intelligent
-
4.
Compared to humans Sara was smart
Again we computed a final perceived intelligence index. Item 2 was reversed, and for each participant, we computed an average score of the 4 items (Cronbach’s α = 0.706).
Footnote 2
Perceived connectedness
Next to measuring perceived friendliness and perceived intelligence—the constructs of core interest in this study—we added a measure of perceived connectedness (Van Bel et al. 2009). We were interested to see whether a higher friendliness score also led to a stronger perception of the bond between the user and the chat-robot (Baumeister and Leary 1995). Social connectedness is an emerging construct in the research literature and we wanted to see whether this measure of long-term bond was also influenced by the social intelligence manipulations.
Social connectedness is defined as the momentary experience of belongingness and relatedness with others (Van Bel et al. 2009; Kaptein et al. 2010c). Several attempts have been undertaken to assess this experience both quantitatively as well as qualitatively. In this experiment, social connectedness was measured using a similar approach as used for the perceived friendliness and perceived intelligence measures. Participants were first asked to grade how emotionally connected they felt to Sara on a 10-point scale. Next, the following items were presented on a seven-point scale:
-
1.
I felt connected to Sara
-
2.
Sara and I developed a bond during our conversation
-
3.
I could connect to Sara
-
4.
Sara shared my interest and ideas
-
5.
I felt related to Sara
A social connectedness score was computed by averaging over the 5 items (Cronbach’s α = 0.891).
Remarks on the conversation
After grading the friendliness, intelligence and connectedness of the chat robot, we presented a number of open-ended questions to participants. We asked participants to remark on the conversation, and to describe a typical good conversation. We also checked the understanding of the study by asking for an explanation of the purpose of the study. These items where added to the study to address possible suspicion of deceit or expectancy.
Additional measures
Next to the questions relating to Sara, we decided to gather a number of background measurements of the participants to be able to identify possible confounding relationships.
One of these measurements was participant’s individual susceptibility to persuasive cues, as measured by the questionnaire presented in Kaptein et al. (2009). This is a twelve-item 7-point rated likert scale addressing the susceptibility to each of the six principles of persuasion as identified by Cialdini (2004) with two items. This scale has shown its predictive value in estimating participant’s compliance to a persuasive request. We included this measure to be able to see whether individuals with higher susceptibility to persuasive cues would also be more influenced by the social cues of mimicry and praise. One overall susceptibility score was computed for each participant (Cronbach’s α = 0.698) (See Appendix).
Next to participants susceptibility to persuasion, we also administered the TIPI: the Ten Item Personality Inventory (Gosling et al. 2003). The TIPI represents a fast and convenient way to measure personality. While not elaborate we believed the TIPI scores could be used in our experiment to see if there were any confounding effects of participants’ personalities on their judgments of friendliness and intelligence of Sara. The TIPI leads to a score on each of the 5 dimensions of the Big Five (Goldberg 1990): Extraversion, Agreeableness, Conscientiousness, Emotional Stability, and Openness to experiences.
Finally, we asked for participants age, gender, and living situation to enable us to control for possible confounds due to these characteristics. In our analysis, we especially focused on gender as a possible confound since gender differences for the effects of praise have previously been shown empirically (e.g. Burgoon and Klingle 1998).
All participants fully finished the study. The average completion time was 24 min (SD = 4.5).