1 Introduction

Argumentation is the process of reaching a consensus through agreement or disagreement [27] being effective for persuading and establishing consensus. Recently, argumentation systems have been developed to improve persuasiveness by generating an argument between two agents and allowing the user to listen to it [25]. Advanced coordination between two robots in a conversation can improve attractiveness [12] and allow the conversation to continue without breaking down, even if the speech recognition is incorrect [2, 14]. We believe that such advanced coordination between robots can enhance the persuasiveness of robotic discussions. Such robots are expected to be used when changing opinions is important, such as improving lifestyle habits. However, it is unclear what tactics robots can employ to alter opinions.

Regarding changes in users’ opinions, Itahara et al. evaluated their impressions when asked to observe a consensus-building conversation between two robots [15] discussing affirmations (positive) and counterarguments (negative) about a topic. They compared four scenarios: two in which either robot changed its opinion to the opposite side, and the other two in which both robots discussed only one side of the opinion. It was found that by making the user observe the robots’ building consensus on a negative opinion, their impression on the topic can become negative, and that a conversation with only negative opinions gives the impression of unfairness. However, strategies for achieving these positive aspects have not yet been investigated and an interactive setting was not evaluated because the users watched only a video.

In this study, we investigated whether users’ opinions can be changed by a consensus-building dialogue between robots based on users’ opinions. We assumed a situation in which two robots discuss a topic with opposing opinions. In such a situation, the effects of two strategies are tested: first, a strategy in which a robot with a different opinion from the user agrees with the same opinion as the user; and second, a strategy in which a robot with the same opinion as the user agrees with a different opinion. We tested the following hypotheses.

H1:

When the robots show a user the convergence to the same opinion as the user, the confidence of the user’s opinion can be increased more than when they do not.

H2:

When the robots show a user the convergence to a different opinion from the user, the confidence of the user’s opinion can be decreased more than when they do not.

To verify these hypotheses, we propose a system that develops discussions between robots according to the user’s choice using an argumentation structure [24]. We conducted two experiments by asking the users to converse with two virtual robot agents in a crowded setting. We evaluated the degree of change in users’ opinions before and after the conversation. We also analyzed the relationship between the effects of the dialogue strategies and the degree of confidence in the opinions of the user before the conversation.

The remainder of this paper is organized as follows: In Sect. 2, related work on dialogue systems using multiple robots and dialogue strategies for persuasive conversations are described. Section 3 describes a system that allows the user to listen to a discussion based on their opinion. Section 4 details the first experiment regarding the dialogue strategy, in which the robot shows agreement with the same stance of the user. By contrast, Sect. 5 details the second experiment regarding the dialogue strategy, in which the robot agrees with the opposite stance of the user. Section 6 discusses its effectiveness and limitations, and the work is concluded in Sect. 7.

2 Related Work

2.1 Studies on Incorporation Between Robots

Incorporating multiple robots in human–robot exchanges is a promising approach to easily control conversation. Arimoto et al. [2] reported that by adding a second robot, users decreased their feelings of being ignored and the difficulty of the conversation. Iio et al. developed predefined turn-taking patterns for multiparty conversations by using multiple robots. This technique is effective, regardless of the user’s answer to a robot’s question [14]. Sugiyama et al. proposed a dialogue strategy in which a robot asked a question to a side-participating robot, which was advantageous for mitigating any user perception of a breakdown in the dialogue [30]. Khalifa et al. demonstrated an English-language learning system using robots, and the second robot (an advanced learner robot) limited the user’s utterances by completing their sentences, thereby improving speech-recognition performance [17]. In this study, we focus on consensus building in discussions to investigate what coordination between the robots could increase or decrease confidence in the user’s opinion.

2.2 Studies on Argumentation System

Recently, argumentation systems have been developed to discuss various topics [10, 29]. Such systems handle argumentation based on computational models, such as argumentation structures [23, 31, 32]. One of the most famous argumentation databases is the AIFdb [18], which is one of the largest databases currently available for argumentation. Sakai et al. created large argumentation structures for dialogue systems [24]. Argumentation systems that can discuss various user topics have been developed based on these structures [1, 13]. The system tracks discussions by tracing the argumentation structure. In addition, some studies attempted to generate discussions between multiple agents through argumentation structures. For example, Rach et al. attempted to extract argumentation structures from online texts and generate agent-agent argumentation dialogue [22]. Mitsuda et al. used a large-scale language model to generate agent-agent discussions with natural utterance texts from argumentation structures [20]. They evaluated the coherence and naturalness of the utterance text; however, the influence of the change in user opinion was not discussed.

2.3 Studies on Social Influence by Others

Humans are easily affected by others. For example, when other dummy human participants make incorrect choices, the target participant also tends to make incorrect choices [4]. This effect could be caused by small dummy humanoid robot participants [28]. Moreover, in a review on social influence [8], the susceptibility to influence varied depending on social relations, such as authority. For example, it has been reported that not only the rational deliberative aspect but also the emotional interpersonal aspect are important factors in consensus building [16]. It is also known that susceptibility to persuasion varies depending on the user’s own personality traits; Bickmore et al. reported that extroverted people are more likely to respond to persuasion from embodied agents, whereas introverts are more likely to respond to persuasion over the phone [5]. In our study, we have robots with the same or different positions as the users in an argument with opposing views. We investigated whether users’ opinions were affected when they indirectly observed the consensus building of robots, one of which could be perceived as being similar to the user.

Several studies have investigated persuasion using humanoid robots [3, 19]. Saunderson et al. [26] showed that interaction strategies that cause a robot to express emotions are more persuasive than logical or neutral strategies in a persuasion task using a robot. In their study, they used two robots, one of which behaved in a neutral manner as a baseline, and the other robot behaved differently depending on the conditions. However, these methods do not allow robots to interact with each other in an advanced manner. Winkle et al. [33] reported that a strategy emphasizing similarity to the user and a strategy based on goodwill significantly improved the number of exercise repetitions through persuasion compared to a strategy emphasizing expertise. In their study, they also focused on the similarity between a robot and a user in terms of argumentation position, but they did not focus on conversations between multiple robots. Studies have aimed to improve persuasiveness by focusing not only on conversational content, but also on nonverbal cues such as voice, gestures, and eye gaze [7, 11]. Thus, persuasion using social robots is not only based on verbal interactions but also on other factors, such as nonverbal information. We were particularly interested in the effects of conversational strategies when robots were grouped together.

3 Dialogue System

Fig. 1
figure 1

System architecture. The two robots’ discussion scenarios are generated by the scenario generation module based on the argumentation structure and the scenario template. The dialogue manager runs the discussion scenario. The robots basically talk to each other by voice and sometimes ask a user questions. Note that to easily understand the robots’ utterances, the utterance texts are shown near the robot. The user clicks the buttons that indicate the possible response to the robots’ questions. Based on the response, the dialogue manager selects appropriate scenarios to be executed

We developed an argumentation system with two robots that made the user feel that the robots built a consensus based on their opinions. Figure 1 shows the architecture of the discussion system.

Fig. 2
figure 2

Subtree of the argumentation structure. The nodes represent utterances corresponding to claims, propositions, and premises, and the edges between the nodes represent the relationship between utterances. For example, for the topic “pros and cons for auto-driving,” there are two conflicting claims (good or bad). The claim “Auto-driving is good” is supported by two propositions, such as “Auto-driving will ease traffic jams.” Each proposition is also supported by two premises, such as “Auto-driving will reduce unnecessary stops and decelerations.”

The scenario-generation module generates discussion scenarios using an argumentation structure [24] and a scenario template [20]. The argumentation structure is a graph structure that represents a discussion of two conflicting stances on a certain topic. In the argumentation structure, the nodes represent utterances corresponding to claims, propositions, and premises. The edges between the nodes represent the relationship between utterances, which are marked as either supportive or non-supportive.

To generate the discussion scenario, a subtree of the structure shown in Fig. 2 was randomly extracted from the original argumentation structure. Specifically, one root node (main proposition), two main issue nodes (opposite stances), four viewpoint nodes (two supportive nodes each under the corresponding main issue nodes, which represent a conversational topic), and eight premise nodes (two supportive nodes each under the corresponding viewpoint nodes, which represent reasons) are extracted. Using this subtree, a discussion scenario was generated based on the template [20] where two robots built a consensus about topics based on user opinions. Table 1 shows the detailed flow used in the scenario generation, which is equivalent to Experiment 1, described later. In Experiment 2, the discussion flow was the same, except for the speaker robot in lines 7–15. Specifically, if the user selects stance A, R2 claims an utterance from the premise node and R1 agrees or disagrees with R2. Notably, if the user selects “no” to the question of the line 11, the robots repeat the same discussion of the lines 5–11. To make utterance sentences in the nodes as natural as spoken language, we used an automatic conversion of Japanese sentence-end expressions [21].

In addition, the module automatically generates the robot’s behavior, such as gaze shifts and gestures, based on handcrafted rules, and adds them to the scenario. As an example of these rules, when a robot speaks to another robot, the robot looks at the other robot using its head and eyeballs. The other robot looked at the robot. As a result, dialogue scenarios describing the robots’ utterances, gazes, gestures, and execution timings were obtained.

Table 1 Discussion flow used in the scenario generation

The dialogue manager parses this scenario and schedules commands for the robots. Following the scheduling operation, the dialogue manager executes the commands as planned. In this study, the user talks to the robots using buttons. When the user presses the button, the dialogue manager selects the appropriate scenario and executes it accordingly.

4 Experiment 1

To verify the effectiveness of the dialogue strategy in which the robots converged to the same opinion as the user, a subject experiment was conducted on crowdsourcing. Our hypothesis was that (H1) when robots show the user the convergence of the same opinion as the user, the user’s confidence in the opinion can increase more than when they do not. We prepared two conditions: agree-with-same-stance (AgreeSS) and disagree-with-same-stance (DisagreeSS). We evaluated the changes in opinions before and after the discussion. Additionally, we evaluated their impressions of the discussions. All procedures in two experiments including experiment 2 were approved by the ethical committee of the Graduate School of Engineering Science, Osaka University, Japan.

4.1 Method

4.1.1 Subjects

A total of 297 adults registered with a Japanese crowdsourcing service participated in the experiment. During the experiment, to ensure that only reliable data was collected, we used data from subjects with high reliability scores for task achievement in the crowdsourcing system. Specifically, we only included people who completed more than 95% of their previous tasks without problems. In addition, we did not use data from participants who did not pass the attention-check questions that tested whether they remembered the dialogue. Finally, we included 236 participants (104 males and 132 females, ranging from teens to septuagenarians, with an average age of 41.4 years (\(SD=10.6\))). All subjects experienced a conversation with a virtual robot agent under both conditions. The order of the two conditions was random.

4.1.2 Apparatus

Fig. 3
figure 3

Scene of the conversation with the virtual robot agents in the experiment site. Robots were placed in the right and left side of the drawing panel. When the robots spoke, the utterance text in the speech balloons was shown in the upper side of the robot. Under the panel, two buttons were placed. The buttons were only shown when the robot waited for the answer. Notably, only Japanese texts were shown on the experimental site

The experiment was conducted on the Internet. A crowdsourcing site was used to recruit participants to submit completed questionnaires. The experimental site, a separate website, provided a conversation experience for the user evaluation and questionnaire. The experimental site consisted of three sections. The first section explains the experiment and verifies that the browser can run the experiment correctly. The second section presents a drawing panel showing the virtual robot agents, as shown in Fig. 3. The third section contained the questionnaire.

The robot’s shape was the same as that of a tabletop humanoid robot called CommU (developed through a collaboration between Osaka University, Japan, and VStone Co,. Ltd.). When the robot spoke, the wave file was synthesized using text-to-speech software and played while synchronizing its mouth. Notably, the robots’ voices were different, allowing the user to easily distinguish the robots. With 14 degrees of freedom, the robot can produce various nonverbal gestures, such as gaze shifts and nodding.

4.1.3 Stimuli

Table 2 Example dialogues

The robots discussed topics using conflicting stances. During the discussion, the robots asked the participants to choose their stance. Based on the subject’s answer, the robot with the same opinion as the user claims the premises. For the given premises, the other robot utters depending on the conditions. That is, in the AgreeSS condition, the other robot with a different opinion showed agreement, whereas in the DisagreeSS condition, the robot showed disagreement. Examples of dialogues for each condition are listed in Table 2. To avoid losing the attention of the subject, after finishing the condition-depending exchange (line 11 in Table 1), the robots inserted a question to check whether the subject carefully listened to, that is “Hey, did the user listen to us carefully?”.

We used the following five topics corresponding to the five argumentation structures [24]: “Do you accept driving automobiles: yes or no?”, “Where do you prefer to live: the countryside or city?”, “Which is the best place to visit in Japan: Hokkaido or Okinawa?”, “Which is the best breakfast: bread or rice?”, and “Which is the best theme park: Tokyo Disney Resort or Universal Studios Japan?”. Based on these structures, we prepared each five subtrees (25 subtrees in total). The conversation time was approximately three minutes.

4.1.4 Procedure

On the crowdsourcing site, the participants read an explanation of the experiment and decided to participate. They then moved to the experimental site and read the instructions at the top of the first page. The instructions directed the participants to proceed with the experiment, such as how to complete the robots’ dialogue and the questionnaire. In addition, the instructions informed the participants that the questionnaire page would be shown after the dialogue page and that they should concentrate on the dialogue because the questionnaire inquired about the content of the dialogue. Furthermore, the instructions on the first page of the experimental site included a description of informed consent. The instructions also indicated that the participants clicked the next button if they decided to participate in the experiment, but exited the page if not.

Subsequently, the participants completed a questionnaire on their backgrounds regarding conversational robots. They then checked the volume of the speaker in the subject’s device used in the experiment. On the next page, the participants were instructed to talk to the conversational robots about the topic using two buttons. The same page showed five topics to be discussed and a questionnaire to be completed after the conversation. On the last part of the same page, the participants were asked to fill in their prior opinions about the five topics.

Subsequently, the participants spoke to the robot. If the dialogue was interrupted (e.g., owing to network errors), the subject could restart the dialogue by pressing a retry button placed below the drawing panel. After completing the dialogue, a questionnaire was created. After the participants completed the questionnaire, they talked to the other robots and completed the questionnaire again. They then download the completed answers and submit them to a crowdsourcing site.

4.1.5 Measurement

A visual analog scale was used to measure the degree of confidence in the opinion to assess the degree of change in opinion and to check whether prior confidence levels could affect the subjects’ scores. The questionnaire item was “For the topic of [topic name], click on the point on the line that comes closest to the degree of your opinion.” The subject rated the item in a slider format ranging from \(-100\) (stance A) to 100 (stance B). The midpoint value of zero corresponded to “undecided.” The prior confidence was calculated using the absolute value of the score. To calculate the degree of change, \(-1\) is multiplied by the difference between the scores before and after the conversation if the participant selected Stance A before the conversation. For example, when a subject rates 50 (stance B with moderate confidence) before the conversation and then rates 100 (stance B with high confidence) after the conversation, the difference is calculated to be \(100-50=50\). However, when they rated 10 (stance B with low confidence) after the conversation, the difference was \(10-50=-40\). When a subject rates \(-50\) (stance A with middle confidence) before the conversation and rates \(-100\) (stance A with high confidence) after the conversation, the difference is \((-100-(-50))\times (-1)=50\). However, when they rate \(-10\) after the conversation, the difference is \((-10-(-50))\times (-1)=-40\). Questions were asked before and after the conversations. Notably, the participants assessed this scale for all domains before the interaction to choose the domains of the conversation but only for the talked-about domain after the session to assess the degree of change in opinion.

To evaluate whether prior confidence affected participants’ posterior scores, a linear mixed model (LMM) was computed. The dependent variable was the degree of change, calculated using the above formula. The fixed effects of the independent variables were condition (1 = AgreeSS, 0 = DisagreeSS) and binary prior confidence (1 = High, 0 = Low). Notably, the threshold for distinguishing between High and Low prior confidence levels was 50. The random effect was subject ID.

To evaluate users’ impressions, we prepared the following follow-up questionnaire:

Persuasiveness:

Do you think that the discussion between the robots is persuasive?

Agreeableness:

Do you think that the discussion between the robots is agreeable?

Realization:

Did you find realizations about the topic after participating in the robots’ discussions?

Deepness:

Did you deepen your understanding of the topic after participating in the robots’ discussions?

Confidence:

Did your own opinion become more confident after participating in the robots’ discussion?

Change:

Was your own opinion changed after participating in the robots’ discussions?

Participants rated the aforementioned items on a seven-point Likert scale ranging from 1 (“strongly disagree”) to 7 (“strongly agree”). The midpoint value of four corresponded to “undecided.” We also used the LMM to evaluate whether prior confidence could affect the questionnaire scores.

For the attention check, we prepared the questionnaire “Do you understand the robot’s conversation?” The participants rated the items on a seven-point Likert scale. We only included subjects who rated their scores higher than 4. In addition, we prepared a question about the dialogue content, that is, “Please select two robots’ opinions that appeared in the conversation.” We prepared four candidates for the answer: two viewpoint nodes that appeared in the conversation and two other viewpoint nodes that did not appear. We only used subjects who succeeded in selecting all the correct viewpoints.

4.2 Results

Fig. 4
figure 4

Boxplots of the degree of change in opinion between the prior and posterior confidences in user’s opinion

Table 3 Results of the linear mixed model for degree of change in opinion

Figure 4 shows the boxplots of the degree of change in opinion between the prior and posterior confidences. As a result of the LMM, shown in Table 3, a significant interaction between the condition and prior confidence was found. Therefore, we performed a simple slope analysis to investigate whether the relationship (simple slope) between the degree of change in opinion and a particular level of an independent variable was significant. Consequently, the slope of the DisagreeSS condition (simple slope \(=-23.378\), Std. Error \(=5.676\), t-value \(=-4.119\), p value \(<.001\)), and high confidence (simple slope \(=27.323\), Std. Error \(=6.312\), t-value \(=4.329\), and p-value \(<.001\)) were significant, indicating that the simple slopes of the DisagreeSS condition and the high confidence level were not zero. Since we assigned the dummy variable 1 in AgreeSS and High confidence and 0 in DisagreeSS and Low confidence, the results suggested that the degree of change in opinion in the DisagreeSS condition at high confidence was lower than that in the DisagreeSS condition at low confidence and that in the AgreeSS condition at high confidence.

Table 4 Results of the linear mixed model for each questionnaire item

Table 4 lists the LMM results for the questionnaire applying Bonferroni correction. As shown in the table, the condition factor had significant main effects on persuasiveness, agreeableness, deepness, and confidence. Since the corresponding coefficients of the condition factor were greater than zero, these results suggest that the scores of the AgreeSS condition were higher than those of the DisagreeSS condition.

The results of the simple slope analysis showed that when a robot with a different opinion from the user disagreed with the same opinion as the user (DisagreeSS dialogue strategy), subjects with a higher prior confidence level significantly decreased their posterior confidence level compared to those with a lower prior confidence level. It was also found that the AgreeSS strategy increased posterior confidence levels more than the DisagreeSS strategy for those with higher prior confidence levels. The results of the follow-up questionnaire showed that persuasiveness, agreeableness, depth, and confidence in the AgreeSS condition were significantly higher than those in the DisagreeSS condition. The results suggest that the subjects in the AgreeSS condition felt that the discussion was more convincing and persuasive because they were supported by another robot. Because the same opinion as the subject converged, the subjects did not gain any new insights or changes in opinion, but they felt that the discussion was deeper and, therefore, the confidence level of the user’s opinion was strengthened.

5 Experiment 2

To verify the effectiveness of the dialogue strategy in which the robots converged to different opinions from the user, another experiment was conducted. Our hypothesis was that (H2) when the robots show the user a convergence of different opinions, the user’s confidence in the opinion can be reduced more than when they do not. We prepared two conditions: agree-with-opposite-stance (AgreeOS) and disagree-with-opposite-stance (DisagreeOS). We also evaluated changes in opinions and impressions of the discussion.

5.1 Method

5.1.1 Subjects

A total of 292 adults registered with the same Japanese crowdsourcing service participated in the experiment. All participants of this experiment did not participate in the Experiment 1. The same procedure used in Experiment 1 was used to collect reliable data. Finally, we included 224 subjects (90 males and 134 females, ranging from teens to septuagenarians, with an average age of 41.6 (\(SD=11.1\))). All subjects also experienced virtual robot agent conversations under both conditions. The order of the two conditions was randomized.

5.1.2 Apparatus

The setup for Experiment 2 was the same as that used in Experiment 1.

5.1.3 Stimuli

Table 5 Example dialogue extracted from the flow 5–10

During the dialogue, the robots discussed a topic with conflicting stances and asked the subjects to choose their stance. The flow of asking the participants about their stance was the same. Based on the subject’s answer, a robot with a different opinion claims the premises. For the given premises, the other robot utters depending on the condition. In other words, in the AgreeOS condition, the other robot with the same opinion as the subject showed agreement, whereas in the DisagreeOS condition, the robot showed disagreement. Examples of dialogues for each condition are listed in Table 5.

The topics and durations of the conversations were the same as in Experiment 1.

5.1.4 Procedure

The procedure for Experiment 2 was the same as that for Experiment 1.

5.1.5 Measurement

The measurements in Experiment 2 were identical to those in Experiment 1.

5.2 Results

Fig. 5
figure 5

Boxplots of the degree of change in opinion between the prior and posterior confidences in opinion

Table 6 Results of the linear mixed model for the degree of change in opinion

Figure 5 shows boxplots of the degree of change in opinion between prior and posterior confidence levels. As a result of the LMM, shown in Table 6, a significant interaction between the condition and prior confidence in the opinion was found. From simple slope analysis, the slope of the AgreeOS condition (simple slope \(=-26.379\), Std. Error \(=6.324\), t-value \(=-4.171\), p-value \(<.001\)) and high confidence (simple slope \(=-16.296\), Std. Errors \(=7.359\), t-value \(=-2.214\), and p-value \(=.028\)) were significant. The results suggest that the degree of change in opinion in the AgreeOS condition at the high confidence level was lower than that in the AgreeOS condition at the low confidence level and that in the DisagreeOS condition at the high confidence level.

Table 7 Results of the linear mixed model for each questionnaire item

Table 7 lists the LMM results for the questionnaire. As shown in the table, significant main effects of the condition factor for the items persuasiveness, agreeableness, deepness, and change were found, suggesting that the scores of the AgreeOS condition were higher than those of the DisagreeOS condition.

The results of the simple slope analysis showed that for the AgreeOS dialogue strategy, that is, when a robot with the same opinion as the user agreed with another one with a different opinion, users with higher prior confidence in the opinion significantly reduced their posterior confidence level more than those with a lower prior confidence level. However, it was also found that the experience of the AgreeOS strategy reduced posterior confidence levels of opinion more than the DisagreeOS strategy did, only for users with higher prior confidence. The results of the follow-up questionnaire also showed that persuasiveness, agreeableness, depth, and change in the AgreeOS condition were higher than those in the DisagreeOS condition. The results suggest that the subjects in the AgreeOS condition felt that the discussion was more convincing and persuasive because they were supported by another robot, even though their opinion was different from that of the user. In addition, they felt that the discussion was deeper, and therefore, their opinions changed.

6 Discussion

6.1 Effects of the dialogue strategy

We tested these hypotheses in the present study.

  1. H1

    When the robots show a user the convergence to the same opinion as the user, the confidence of the user’s opinion can be increased more than when they do not.

  2. H2

    When the robots show a user the convergence to a different opinion from the user, the confidence of the user’s opinion can be decreased more than when they do not.

Hypothesis 1 was partially supported by the results of Experiment 1; in other words, the effect of robot consensus was observed for users who had a higher level of confidence in their prior opinions. In addition, the results of Experiment 2 partially supported Hypothesis 2. In the higher initial confidence group, the maximum and minimum ranges of the boxplots of the AgreeSS in Fig. 4 were narrower than in the other conditions. A similar result was observed in the DisagreeOS in Fig. 5. This suggests that people with higher initial confidence in their opinions are less likely to change their opinions. Nevertheless, dialogue strategies demonstrating convergence based on user opinions were found to be effective for users with higher levels of confidence in their prior opinions.

We originally assumed that robot consensus was effective for those with lower confidence in their prior opinions; however, our results did not support this assumption. We assumed that users who scored lower on their opinions were less confident. However, we may have identified users who were not interested in this opinion. The results of the follow-up questionnaire confirmed that there is an overall effect on the impression of the dialogue; however, as for the change in opinion, robot consensus may not have a strong influence unless users are interested in the topic. Therefore, we believe that our experimental results are consistent with the both hypotheses when the users who are interested in the topics are limited.

These findings suggest that it is possible to create robots that easily persuade or negotiate with users by varying their level of confidence in their opinions. The results also show that such sophisticated cooperation between robots can affect user impressions, which will accelerate the development of dialogue strategies using multiple robots in the future. Note that the dialogue template is independent of the dialogue domain and can, therefore, be applied to dialogues from various domains. In addition, the argumentation structure used in this study had only a small number of nodes, suggesting that a persuasive discussion could be provided without a significant cost.

6.2 Limitation

In this study, no significant changes in opinion were observed in humans with lower levels of confidence. We considered that those with lower levels of confidence would fall into two categories: those who were unable to decide due to extreme anxiety and those who were unable to decide due to low interest in the topic at hand. Based on this, our discussion method may have worked well for the former; unfortunately, we did not find any people that fit this description. This must be supported by additional research.

For control purposes, the robots that showed agreement or disagreement corresponded to the conditions that used only simple responses, i.e., “that is correct” and “I don’t think so.” However, it may be possible to increase the sense of convincing and persuasiveness by having the user observe a more in-depth discussion situation, such as displaying multiple exchanges from a single viewpoint. This may explain why the opinions of users with lower levels of confidence did not change. In the future, it would be worth investigating more advanced discussion scenarios to increase persuasiveness. In particular, adding another condition in which the robot claims another premise in support of the robot’s responses, that is, the robot says, “I don’t think so because [reason from the other stance]” instead of simply saying “I don’t think so,” is an important future work. Moreover, because the argumentation structure [24] used in this study included nodes for rebuttals, it may be easy to investigate the effects of rebuttals.

In this study, the visual analog scale was adapted to measure the confidence level of the opinion because we aimed to understand the strength of opinions about two opposing positions in one dimension, and to facilitate the detection of differences before and after a conversation. However, the consistency of the quantification of the confidence level was not evaluated, although many studies have confirmed the validity of the visual analog scale [6, 9]. In the future, the consistency should be checked.

A difference between this study and the previous study [15] was that the evaluation was not conducted in an interactive situation. However, the effects of this interactivity were not directly evaluated. Comparing the conditions with and without interactivity is a future work.

In addition, two arguments and two premises were discussed in a single conversation. These numbers were used to indicate agreement or disagreement through repetition. However, our results may be limited in these situations. Moreover, increasing the number of statements may strongly affect changes in opinion, even for people with a lower confidence level. Furthermore, it is important to investigate another dialogue strategy showing a more complex convergence process; for example, showing a process in which the robot initially disagrees with the other stance, but changes its opinion midway through to agree with it. The effects of these numbers and dialogue strategy should be investigated in the future.

7 Conclusion

In this study, the effect of a dialogue strategy in which robots converge their opinions based on a user’s opinion was verified. The crowd-based experiments were conducted using virtual robotic agents. The results showed that presenting arguments that converged to the same opinion as the user increased the persuasiveness, conviction, and depth of opinion. It was found that people with a higher confidence level in their prior opinion increased their posterior confidence level when the robots converged to the same opinion as the user. It was also found that persuasiveness, conviction, and depth of opinion increased when the user observed an argument in which robots converged to different opinions from the user. In addition, the posterior confidence level of the user was reduced, especially for those with a high prior opinion confidence level. Our findings provide a substantial contribution to the study of persuasion employing many robots as well as the advancement of robotic conversation coordination. Future research will concentrate on creating dialogue patterns that reinforce these benefits by considering more advanced coordination with robots.